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Introduction 


Prostate  cancer  (PCa)  is  the  leading  cancer  among  men  in  the  United  States,  and  is  a  disease  with 
strong  genetic  susceptibility.  The  genetic  susceptibility  is  due  to  the  inheritance  of  altered  germline 
DNA  sequences,  either  in  the  form  of  point  mutations  such  as  single  nucleotide  polymorphisms 
(SNPs),  or  deletions/gains  of  a  string  of  nucleotides  such  as  copy  number  polymorphisms  (CNPs). 
Most  current  genetic  studies  focus  only  on  the  role  of  SNPs  in  genetic  susceptibility.  In  contrast,  few 
studies  have  explored  the  role  of  deletions/gains  in  cancer  predisposition,  due  to  limited  methods.  In 
fact,  germline  deletions/gains  are  common  in  the  human  genome  and  may  have  a  significant  impact 
on  gene  products  because  they  can  involve  an  entire  gene  or  a  significant  portion  of  a  gene.  They 
may  play  a  more  important  role  in  hereditary  PCa  (HPC),  a  type  of  PCa  that  is  likely  due  to  germline 
changes  in  major  genes. 

With  the  support  of  an  Exploration-hypothesis  development  (EHD)  grant  from  the  DOD,  we  have 
made  important  progresses  toward  this  new  research  area. 


2 


Body 


The  novel  hypothesis  of  the  proposal  is  that  germline  gross  deletions/insertions  (CNPs),  as  well  as 
single  nucleotide  substitutions  (SNPs),  in  the  genome  affect  the  function  and/or  expression  of  PCa 
related  genes  and  thus  contribute  to  the  genetic  susceptibility  of  PCa. 

We  had  two  specific  aims:  1 )  screen  for  germline  CNPs  in  the  genome  by  measuring  the  allele 
intensity  of  500K  SNPs;  2)  to  compare  the  germline  CNPs  among  relatives;  3)  confirm  the  CNPs 
using  real-time  PCR;  and  4)  test  for  co-segregation  of  the  identified  CNPs  with  prostate  cancer  in 
these  three  families. 

We  have  completed  all  the  specific  aims.  Some  of  the  results  are  published  in  our  recent  paper  (Liu 
2006).  They  are  briefly  described  below. 

Using  the  Affymetrix  100K  SNP  Mapping  set  to  detect  copy  number  differences.  We  genotyped 
DNA  samples  isolated  from  blood  samples  of  23  subjects  in  four  PCa  families  ascertained  at  Johns 
Hopkins  Hospital.  The  average  genotype  call  rate  of  the  100K  panel  in  these  subjects  was  99.2%. 
These  data  suggest  that  DNA  samples  isolated  from  blood  many  years  ago  are  stable  for  Affymetrix 
SNP  analysis. 

We  began  with  lymphoblastoid  cell  (LCL)  line  DNA  from  a  PCa  patient  (005-015).  Allele  intensities  of 
more  than  100K  SNPs  were  analyzed  using  default  settings  of  CNAT  (500-kb  smooth  averages)  and 
CNAG.  Two  large-scale  deletions  and  four  large-scale  gains  were  evident  in  chromosome  9  (Fig  1). 
For  example,  at  the  9q31 .1  region,  65  consecutive  SNPs  spanning  2.4  Mb  signaled  a  deletion:  copy 
number  (CN)  <  1 .2  and  P-values  <  10'5  5  (Fig  2a).  Similarly,  at  9q21 .2,  a  set  of  24  consecutive  SNPs 
spanning  0.5  Mb  revealed  a  gain:  CN  >  2.8  and  P-values  <  10'4  (Fig  2b).  The  deletion  and  gain  were 
confirmed  by  qPCR  (Fig  2c-d).  These  data  suggest  that  the  Affymetrix  100K  SNP  panel  is  able  to 
detect  large-scale  deletions  and  insertions.  Furthermore,  these  data  provide  an  empirical  basis  for 
establishing  the  criteria  that  can  be  used  to  define  deletions  and  gains. 


Potential  artifact  of  CNPs  in  lymphoblastoid  cell  line  DNA.  Because  of  the  concern  of  potential 
in  vitro  chromosomal  copy  number  changes  in  the  course  of  cell  culture  of  LCLs,  we  attempted  to 
confirm  the  above  deletions  and  insertions  within  a  matched  blood  DNA  sample  from  the  same 
subject.  At  the  same  9q31 .1  and  9q21 .2  regions  where  respective  deletions  and  gains  had  been 
observed  in  LCL  DNA,  no  evidence  for  a  deletion  was  observed  in  the  blood  DNA  (Fig  2e).  For 
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9q31 .1 ,  the  median  CN  of  these  65  SNPs  was  1 .85, 
with  a  range  of  1 .6  to  2.1 .  There  were  12  heterozygous 
among  these  65  SNPs,  compared  with  no 
heterozygous  when  these  SNPs  were  assayed  in  LCL 
DNA.  At  the  9q21 .2  region,  no  evidence  of  a  gain  was 
observed  in  the  blood  DNA  (Fig  2f),  and  the  median 
CN  was  1 .9  with  a  range  of  1 .7  to  2.0.  Consistent  with 
these  results,  qPCR  assays  failed  to  detect  the 
deletion  and  insertion  at  these  regions  in  blood  DNA 
(Fig  2g-h).  These  data  suggest  the  large-scale  deletion 
and  insertion  observed  in  the  LCL  DNA  of  this  subject 
were  somatic  changes  that  occurred  in  the  course  of 
cell  culture.  In  addition,  the  average  successful  SNP 
call  rates  that  we  obtained  were  99.20%  from  blood 
DNA,  in  contrast  to  98.68%  for  LCL  DNA.  Therefore,  it 
appears  that  DNA  isolated  from  blood  is  more  reliable 
than  LCL  DNA  for  studies  of  germline  CNPs. 


CNPs  >100  kb  are  rare  in  the  genome.  We  screened 
CNPs  in  blood  DNA  from  an  additional  22  subjects  in 
four  HPC  families  with  Affymetrix  100k  SNP  mapping 
sets.  For  putative  deletions,  we  used  the  working 
criteria  of  a  minimum  of  two  consecutive  SNPs  with 
the  following  characteristics:  CN  <  1.2,  P-value  <  10- 
5.5,  a  separation  of  >  0.55  in  CN  between  groups  of 
subjects,  and  homozygous  genotypes.  For  putative 
gains,  we  used  the  working  criteria  of  a  minimum  of 
two  consecutive  SNPs  with  the  following 
characteristics:  CN  >  2.8,  P-value  <  10-4,  and  a 
separation  of  >  0.55  in  CN  between  groups  of 
subjects.  With  these  criteria,  not  a  single  region  in  the 
genome  met  the  criteria  for  a  deletion  in  any  of  the  23 
subjects  analyzed.  However,  we  found  four  regions 
that  met  the  criteria  for  gains,  one  of  which  is  within  a 
known  replicon  that  is  mapped  to  several 
chromosomes.  For  the  remaining  three  regions,  we 
performed  qPCR  and  confirmed  two  of  these  gains 
(Table  1).  One  confirmed  gain  involved  two  SNPs 
spanning  32,790  bp  at  1 0q  1 1  and  was  found  among  multiple  subjects  in  all  three  families.  This  CNP 
is  also  within  a  known  replicon  that  is  mapped  to  a  single  chromosomal  region,  and  has  been 
previously  described  (Sebat  2004;  lafrate  2004).  The  other  confirmed  gain  was  found  in  multiple 
subjects  from  a  single  family  and  involved  four  SNPs  spanning  9,095  bp  at  1 9q  1 3.  This  is  a  novel 
CNP  and  is  not  within  a  known  duplicon.  The  remaining  region  that  was  not  confirmed  by  qPCR  was 
found  in  a  single  subject.  These  results  suggest  that  large-scale  germline  CNPs  involving  several 
hundred  kb  can  be  detected  using  the  Affymetrix  100K  SNP  panel.  However,  the  frequencies  of 
large-scale  germline  CNPs  in  our  study  samples  are  not  as  common  as  predicted  (Sebat  2004; 
lafrate  2004). 
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Fig  2.  DNA  copy  number  (CN)  changes  detected  by 
100K  SNP  mapping  panel  of  the  Affymetrix 
GeneChip®.  A  deletion  at  9q31.1  (a)  and  a  gain  at 
9q21.2  (b)  were  observed  in  lymphoblastoid  cell  line 
(LCL)  DNA  of  a  subject.  Quantitative  real-time  PCR 
(qPCR)  confirmed  the  deletion  (c)  and  gain  (d)  in  this 
sample.  No  deletion  or  gain  was  found  in  a  blood 
DNA  sample  from  the  same  individual,  as  measured 
either  by  the  Gene  Chip  (e  and  f,  respectively),  or 
qPCR  (g  and  h,  respectively).  In  panels  c,  d,  g,  and  h, 
the  X-axis  shows  the  averaged  Ct  values  for  the 
control  (GAPDH)  amplicon  and  the  Y-axis  shows  the 
test  amplicons  for  the  three  dilutions  of  each  tested 
DNA  sample  (20,  10,  5ng).  The  Ct  values  for  control 
and  test  amplicons  at  each  dilution  of  DNA  were 
plotted  against  each  other,  and  the  offset  between 
best-fit  lines  along  the  test-amplicon  axis  indicates 
CN  differences  between  samples.  In  the  Gene  Chip 
analysis,  a  CN  difference  of  1  between  samples  was 
found  in  LCL  DNA  (c  and  d)  and  no  change  was  found 
in  blood  DNA  (e  and  fl. 
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CNPs  involving  100s  to  1,000s  bp  are  more  common.  While  the  500-kb  smooth  average  is  a 
good  option  for  detecting  large-scale  CNPs,  it  may  miss  smaller  CNPs  because  altered  allele 
intensities  of  implicated  SNPs  may  be  averaged  out  by  the  allele  intensities  of  SNPs  in  the  flanking 
normal  region.  To  test  this,  we  searched  the  genome  for  small-  and  mid-scale  CNPs  by  re-analyzing 
the  data  using  smaller  window  sizes  (1 00-kb  and  1 0-kb).  We  found  three  regions  that  met  the 
working  criteria  of  deletions  using  a  1 00-kb  window  size,  all  of  which  were  confirmed  by  qPCR 
(Table  1).  One  such  deletion  involved  ten  SNPs  spanning  145,676  bp  at  16q21.  Using  a  10-kb 
window  size,  we  found  nine  additional  regions  that  met  the  working  criteria  of  deletions;  five  were 
confirmed  by  qPCR.  For  the  remaining  four  regions,  mutations  were  found  in  either  probe  or 
restriction  enzyme  sequences,  which  likely  decreased  the  intensity  of  hybridization,  thereby 
affecting  CN  calculation.  We  also  found  18  regions  that  met  the  working  criteria  of  gains  using  a 
100-Kb  window  size,  and  three  of  the  six  regions  selected  for  qPCR  assays  were  confirmed.  In 
addition,  50  additional  regions  met  the  working  criteria  of  gains  using  a  10-Kb  window  size,  and  only 


Table  1. 

Germline  CNPs  identified  by  100K  SNP  mapping  panel  of  Affymetrix  and  confirmed  by  quantitative  real-time  PCR 

Type  of 

Detection 

Chromosoma 

Implicated  SNPs 

Multiple  Repetative 

Genic 

Previously 

CNPs 

methods 

region 

# 

Positions  (bp) 

Size  (bp) 

subjects 

regions 

regions 

reported 

Deletion 

100  kb  smooth  window 

2q  12.1 

2 

105,169,462  -  105,170,804 

1,342 

Y 

N 

N 

N 

1 6q2 1 

10 

63,032,479  -63,178,155 

145,676 

N 

N 

N 

N 

20p12.2 

2 

9,868317-9,893-310 

24,993 

N 

N 

N 

N 

10  kb  smooth  window 

2q32.1 

2 

185,000,369  -  185,002,549 

2,180 

Y 

N 

Y 

N 

3p14.2 

3 

62,951,975  -62,952,488 

69,187 

N 

N 

Y 

N 

4q28.3 

2 

135,518,614  -  135,519,403 

789 

Y 

N 

N 

N 

1 2q23. 1 

3 

98,791,609  -98,791,950 

341 

Y 

N 

Y 

N 

12q24.23 

2 

118,401,599  -  118,401,998 

399 

Y 

N 

Y 

N 

Gain 

500  kb  smooth  window 

1 0q  1 1 .22 

2 

46,465,579  -46,498,369 

32,790 

Y 

Y 

Y 

Y 

1 9q  13.41 

4 

57,024,188  -  57,033,283 

9,095 

Y 

N 

N 

N 

100  kb  smooth  window 

5p1 3.31 

4 

8,920,157  -8,956,052 

35,895 

Y 

N 

Y 

N 

1 4q  1 1 .2 

2 

18,205,576  -  18,218,052 

12,476 

Y 

Y 

Y 

Y 

1 7p1 3.2 

5 

5,632,970  -5,726,274 

93,304 

Y 

N 

Y 

N 

10  kb  smooth  window 

7p21 .2 

4 

14,785,866  -  14,790,343 

4,477 

Y 

N 

N 

N 

_ 

one  of  the  six  regions  examined  by  qPCR  assays  was  confirmed. 


Among  the  14  confirmed  CNPs  in  our  study,  13  were  novel.  Two  of  six  gains  were  located  within 
replicons,  but  none  of  the  eight  deletions  was  within  a  replicon.  Eight  CNPs,  including  4  deletions 
and  4  gains,  involved  either  known  genes  or  predicted  genes.  The  size  of  the  CNPs  ranged  from 
341  bp  to  145,676  bp. 

Need  for  a  higher  resolution  of  SNP  panel.  Our  data  suggested  that  the  Affymetrix  1 00K  SNP 
mapping  panel  can  be  used  to  identify  germline  CNPs.  However,  we  recognize  that  our  study  may 
miss  a  substantial  number  of  smaller  CNPs  due  to  limited  SNP  resolution.  This  potential  limitation 
can  be  overcome  by  using  the  higher  resolution  500K  SNP  panel  from  Affymetrix,  which  has  a 
median  inter-SNP  distance  of  2.5  kb. 

Identification  of  CNPs  using  the  Affymetrix  500K  SNP  Mapping  Panel.  We  recently  tested  the 
Affymetrix  500K  in  identifying  germline  CNPs.  The  results  are  very  encouraging  and  support  that  we 
can  use  the  panel  to  accurately  identify  germline  CNPs. 

Identifying  germline  CNPs  among  48  HapMap  subjects  using  Affymetrix  500K  mapping 
panel.  To  assess  the  feasibility  of  using  the  Affymetrix  500k  SNP  mapping  set  to  detect  CNPs,  we 
analyzed  an  Affymetrix  500k  SNP  mapping  dataset,  which  contains  allele  intensity  data  for  48 
individuals  (including  5  HapMap  CEPH  trios,  5  Yoruban  trios,  three  other  non-HapMap  trios,  and  9 
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unrelated  HapMap  Asian  samples).  For  putative  deletions,  we  used  the  working  criteria  of  a 
minimum  of  three  out  of  four  consecutive  SNPs  with  a  separation  of  >0.5  in  CN  between  groups  of 
subjects,  and  homozygous  genotypes.  For  putative  gains,  we  used  the  working  criteria  of  a 
minimum  of  three  out  of  four  consecutive  SNPs  with  a  separation  of  >0.5  in  CN  between  groups  of 
subjects.  A  1  -kb  smooth  window  was  used  to  generate  copy  number.  The  criteria  we  used  here  is 
slightly  different  from  the  ones  we  used  previously  for  the  100K  mapping  panel  because  of 
different  computer  programs  (dChip  vs.  CNAT).  We  only  included  potential  CNPs  that  1 )  appear  at 
least  two  times  in  these  48  individuals,  and  2)  follow  Medelian  inheritance  in  trios.  These  CNPs 
have  higher  likelihood  to  be  true  CNPs.  Altogether,  we  identified  824  deletions  and  722 
amplifications.  The  average,  median,  and  size  ranges  for  these  CNPs  are  summarized  in  Table  5. 
Some  of  these  CNPs  are  relatively  common.  Fifty  deletions  and  forty-four  amplifications  identified 
have  frequencies  >  5%.  There  are  337  deletions  and  332  amplifications  in  the  genic  region. 

Sixteen  deletions  and  28  amplifications  involve  at  least  one  exon  (Table  2).  Interestingly,  we 
observed  several  ethnicity-specific  CNPs. 


Table  2.  Summary  for  CNPs  identified  in  48  individuals  using  Affymetrix  500k  SNP  mapping  panel 


Type  of  CNPs  #  of  CNPs  Average  Median  Size  range  (kb)  #  of  CNPs  with  allele  #  of  CNPs  in  genic  #  of  CNPs  covering 
identified  size  (kb)  size  (kb)  frequency  >  0.05  regions  exonic  regions 

Deletion  824  47.4  18.2  1-2,395  50  337  16 

Amplification _ 722 _ 53J _ 247 _ 1-3,204 _ 44 _ 332 _ 28 _ 


We  identified  germline  CNPs  among  four  PCa  probands  in  our  HPC  families  using  the 
Affymetrix  500K  mapping  panel.  The  average  call  rates  were  93.32%  for  the  Sty  array  and 
95.63%  for  the  Nsp  array,  respectively.  The  same  criteria  used  to  identify  deletions  and 
amplifications  as  mentioned  above  were  applied.  Altogether,  we  identified  35  deletions  and  109 
amplifications.  Within  these  CNPs,  13  deletions  and  49  amplifications  are  in  genic  regions. 
Recently,  we  completed  genotyping  of  Affymetrix  500K  SNP  arrays  among  206  HPC  probands. 

A  deletion  involving  a  candidate  PCa  tumor  suppressor  gene.  Interestingly,  one  of  the 
identified  genic  deletions  involves  a  deletion  of  the  last  4  exons  of  a  tumor  suppressor  gene 
WWOX  (WW  domain-containing  oxidoreductase  isoform  2).  This  gene  encodes  a  protein  which 
contains  2  WW  domains  and  a  short-chain  dehydrogenase/reductase  domain  (SRD).  The  highest 
normal  expression  of  this  gene  is  detected  in  hormonally  regulated  tissues  such  as  testis,  ovary, 
and  prostate.  This  expression  pattern  and  the  presence  of  an  SRD  domain  suggest  a  role  for  this 
gene  in  steroid  metabolism.  In  addition,  it  was  also  implicated  in  tumor  necrosis  factor  (TNF)- 
mediated  cell  death,  as  well  as  p53  controlled  genotoxic  stress-induced  cell  death.  Loss  of 
heterozygosity  and  down-regulation  of  this  tumor  suppressor  gene  in  PCa  has  been  reported11. 
Homozygous  deletion  of  WWOW  exons  has  been  reported  in  ovarian  cancer12,  and  hepatocellular 
carcinoma  (Yakicier  2001 ).  Intriguingly,  this  particular  CNP  is  rare  in  the  48  HapMap  subjects 
(allele  frequency  =1 .4%;  the  mother  and  son  of  one  trio  carry  this  CNP),  but  two  out  of  the  four 
PCa  subjects  we  genotyped  carry  this  CNP.  It  is  likely  that  germline  deletions  of  a  tumor 
suppressor  gene  (which  would  be  undetected  in  a  regular  LOH  study),  followed  by  somatic 
deletions  or  mutations  of  the  wild-type  allele,  contributes  to  tumorigenesis. 
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Published  one  paper  that  is  directly  related  to  this  grant. 
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Zheng  SL,  Sun  J,  Cheng  Y,  Li  G,  Hsu  FC,  Zhu  Y,  Chang  BL,  Liu  W,  Kim  JW,  Turner  AR,  Gielzak  M, 
Yan  G,  Isaacs  SD,  Wiley  KE,  Sauvageot  J,  Chen  HS,  Gurganus  R,  Mangold  LA,  Track  BJ,  Gronberg 
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gene,  is  significantly  associated  with  high-grade  prostate  cancers.  Clin  Cancer  Res.  2007  Sep 
1 ;  1 3(1 7):5028-33. 
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transcripts  in  human  prostate  cancers.  Genes  Chromosomes  Cancer.  2007  Nov;46(1 1  ):972-80. 
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Chromosomes  Cancer.  2006  Nov;45(1 1):  101 8-32. 
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Reportable  Outcomes 

1)  Discovered  germline  CNPs  in  the  genome  by  measuring  the  allele  intensity  of  500K  SNPs. 

2)  Most  of  germline  CNPs  are  inherited  from  parents. 

3)  Majority  of  detected  CNPs  can  be  confirmed  using  real-time  PCR. 
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Conclusion 


This  novel  and  systematic  approach,  when  applied  to  this  high-risk  hereditary  study  population, 
increases  the  likelihood  to  identify  important  genetic  alterations  that  predispose  to  prostate  cancer 
risk.  To  our  knowledge,  our  study  is  the  first  of  its  kind.  If  our  study  is  successful,  it  will  likely 
contribute  to  our  understanding  of  prostate  cancer  etiology,  and  provide  novel  targets  for  prostate 
cancer  risk  assessment,  prevention,  and  therapy. 
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BACKGROUND.  Recent  studies  using  ROMA  and  Array-CGH  suggest  that  germline  copy 
number  polymorphisms  (CNPs)  involving  >100  kb  are  common  in  humans. 

METHODS.  In  this  study,  we  used  the  Asymetrix  GeneChip  100K  single  nucleotide 
polymorphisms  (SNP)  mapping  panel  to  further  examine  the  type  and  frequency  of  germline 
CNPs  in  the  genome.  By  utilizing  the  allele  intensity  data  generated  while  genotyping  ~1 1 6,000 
SNPs  among  23  subjects  from  4  families,  we  were  able  to  detect  multiple  CNPs. 

RESULTS.  However,  in  contrast  to  several  previous  studies,  we  found  that  CNPs  >100  kb  are 
rare  in  the  genome  but  CNPs  involving  100s-l,000s  of  base  pairs  are  more  common. 
CONCLUSIONS.  We  have  demonstrated  the  utility  of  this  approach,  which  has  an  important 
advantage  over  other  methods  because  it  is  able  to  simultaneously  assess  both  CNPs  and  SNPs, 
and  therefore  has  great  potential  in  genetic  association  studies  of  common  diseases.  Prostate  67: 
227-233, 2007.  ©  2006  Wiley-Liss,  Inc. 

KEY  WORDS:  germline;  DNA  copy  number  polymorphisms  (CNPs);  genomewide 


INTRODUCTION 

Using  a  representational  oligonucleotide  microarray 
analysis  (ROMA),  Sebat  et  al.  [1]  performed  a  genome¬ 
wide  analysis  of  germline  gross  deletions/gains. 
Among  20  normal  subjects,  they  found  221  DNA  copy 
number  differences  representing  76  unique  copy 
number  polymorphisms  (CNPs)  in  the  genome,  with 
a  median  length  of  222  kb.  Similarly,  lafrate  et  al.  [2] 
reported  255  CNPs  in  the  genome  among  39  unrelated 
healthy  individuals  and  16  individuals  with  known 
chromosomal  imbalances  using  array-based  compara¬ 
tive  genomic  hybridization  (array-CGH).  Together 
with  several  other  reports  [3-5],  it  appears  germline 
CNPs  in  the  genome  are  more  common  than  previously 
estimated. 


To  further  understand  germline  CNPs  in  the 
genome,  we  utilized  the  100K  single  nucleotide  poly¬ 
morphisms  (SNP)  mapping  panel  of  the  Affymetrix 
GeneChip  for  systematic  detection  of  germline  CNPs. 
This  alternative  is  appealing  because  it  allows  for 
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measurement  of  both  allele  intensity  and  genotypes  of 
SNPs,  and  this  combination  provides  information,  that 
is,  important  in  defining  CNPs,  especially  deletions. 

MATERIALS  AND  METHODS 

Microarray  Analysis 

Four  families,  including  23  subjects  with  or  without 
prostate  cancer  and  women,  were  selected  from  a  large 
set  of  188  hereditary  prostate  cancer  families  collected 
at  Johns  Hopkins  Hospital  [6].  Genomic  DNA  from 
blood  and  lymphoblastoid  cell  lines  (LCL)  was  assayed 
using  the  100K  SNP  mapping  panel  following  the 
manufacturer's  standard  protocol.  Briefly,  250  ng  of 
genomic  DNA  was  digested  with  either  Hind  III  or  Xba 
I,  and  then  ligated  to  adapters  that  recognize  the 
cohesive  four  basepair  (bp)  overhangs.  A  generic 
primer  that  recognizes  the  adapter  sequence  was  used 
to  amplify  adapter-ligated  DNA  fragments  with  PCR 
conditions  optimized  to  preferentially  amplify  frag¬ 
ments  in  the  250-2,000  bp  size  range  in  a  GeneAmp 
PCR  System  9700  (Applied  Biosystems,  Foster  City, 
CA).  After  purification  with  a  QI AGEN  MinElute  96  UF 
PCR  purifica  tion  system,  a  total  of  40  jig  of  PCR  product 
was  fragmented  and  about  2.9  jig  was  visualized  on  a 
4%  TBE  agarose  gel  to  confirm  that  the  average  size  was 
smaller  than  180  bp.  The  fragmented  DNA  was  then 
labeled  with  biotin  and  hybridized  to  the  GeneChip 
Mapping  100K  Set  for  17  hr  at  48 °C  in  a  Hybridization 
Oven  640.  We  washed  and  stained  the  arrays  using 
Affymetrix  Fluidics  Station  450  and  scanned  the  arrays 
using  a  GeneChip  Scanner  3000  G7  (Affymetrix,  Inc,, 
Santa  Clara,  CA), 

The  Affymetrix  GeneChip  Operating  Software 
(GCOS)  collected  and  extracted  feature  data  from 
Affymetrix  GeneChip  Scanners.  We  used  the  GeneChip 
DNA  analysis  software  (GDAS)  to  analyze  cell  inten¬ 
sity  data  stored  in  the  GCOS  Database  for  genotyping 
using  a  Dynamic  Model  mapping  algorithm.  We 
calculated  DNA  copy  numbers  using  Chromosome 
Copy  Number  Analysis  Tool  (CNAT)  version  2.0,  Copy 
Number  Analyzer  for  Affymetrix  GeneChip  (CNAG) 
[7],  and  dChipSNP  [8].  Gaussian  kernel-smoothing 
average  was  used  for  averaging  the  copy  number  and 
P-value  of  individual  SNPs  over  a  fixed  genomic 
interval  (10, 100,  or  500  kb  window  size).  The  smooth¬ 
ing  averages  out  the  random  noise  across  flanking 
SNPs  and  minimizes  the  false-positive  rate,  while 
keeping  the  true-posi  tive  rate  high.  The  kernel-smooth¬ 
ing  accentuates  genomic  intervals  in  which  consecutive 
SNPs  display  the  same  type  of  alteration  (gain  or  loss). 
The  100K  SNP  mapping  panel  contains  116,204  SNP 
probes,  with  a  median  physical  distance  between  SNPs 
of  8.5  kb.  The  average  successful  SNP  call  rates  were 
99.20  and  98.68%  for  blood  and  LCL  DNA,  respectively. 


Confirmation  via  q  PCR 

A  subset  of  putative  CN  changes  was  subjected  to 
confirmation  by  quantitative  real-time  PCR  (qPCR) 
using  the  ABI  Prism  7000  Sequence  Detection  System, 
and  direct  sequencing  using  the  ABI  3700  DNA 
Analyzer.  For  qPCR,  primers  were  designed  using 
Primer  Express  1.5  software  from  Applied  Biosystems. 
Amplicons  were  designed  against  the  putatively 
hemizygous  locus  and  a  control  locus  of  known  normal 
copy  number.  The  PCR  kinetics  at  the  control  locus  was 
used  to  control  for  sample-to-sample  differences  in 
genomic  DNA  purity  and  concentration.  Three  con¬ 
centrations  of  each  genomic  DNA  sample  (20, 10,  and 
5  ng)  were  assayed  in  duplicate,  using  each  pair  of  real- 
time-PCR  primers.  PCRs  were  prepared  as  follows:  in 
20  pi,  we  combined  2  pi  of  genomic  DNA,  0.05  jiM  of 
each  primer,  and  SYBR-Green  PCR  Master  Mix  from 
Applied  Biosystems.  PCRs  were  performed  as  follows: 
95°C  for  1 0  min,  followed  by  40  cycles  at  95°C  for  20  sec, 
and  60°C  for  1  min.  An  additional  cycle  of  95°C  for  15  sec, 
60°C  for  20  sec,  and  95  °C  for  15  sec  was  run  at  the  end 
to  measure  the  dissociation  curve  for  qualify  control. 
We  used  the  Sequence  Detection  Software  (SDS)  for 
PCR  baseline  subtraction  and  exported  the  threshold 
cycle  number  (Ct)  data  for  analysis.  Ct  values  for  the 
control  and  test  amplicons  for  the  three  dilutions  of 
each  DNA  sample  were  plotted  against  each  other,  and 
the  offset  between  two  samples  along  the  control- 
amplicon  axis  and  test-amplicon  axis  was  measured. 
An  offset  of  0.8-1.2  along  the  test-amplicon  axis  was 
taken  to  indicate  a  copy  number  difference  of  1  to  2 
ratios  between  the  two  samples  at  that  locus.  For  direct 
sequencing,  we  designed  PCR  primers  to  cover  the 
entire  regions  of  the  Hind  III  or  Xba  I  restriction 
fragments  using  primer  3  (http://frodo.wi.mit.edu/ 
cgi-bin/ primer3/ primer3_www.cgi)  and  synthesized 
by  Integrated  DNA  Technologies,  Inc  (Coralville,  LA). 
We  used  Platinum  Pfx  DNA  polymerase  from  Invitro- 
gen  to  produce  PCR  products  for  sequencing.  All  PCR 
products  were  purified  using  the  QIAquick  PCR 
purification  Kit  (Qiagen)  to  remove  deoxynucleoside 
triphosphates  and  excess  primers.  We  performed  all 
sequencing  reactions  using  dye-terminator  chemistry 
(BigDye;  ABI,  Foster  City,  CA),  and  separated  the 
products  in  an  ABI  3700  DNA  Analyzer.  We  identified 
SNPs  and  other  small  changes  using  Sequencher 
software  version  4.0.5  (Gene  Codes  Corporation). 

RESULTS  AND  DISCUSSION 

We  began  with  an  LCL  DNA  sample  from  a  prostate 
cancer  patient  (005-015).  Allele  intensities  of  over  100K 
SNPs  were  analyzed  using  default  settings  of  CNAT 
(500-kb  smooth  averages)  and  CNAG.  Two  large-scale 
deletions  and  four  large-scale  gains  were  evident  in 
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chromosome  9  (Fig.  1).  For  example,  at  the  9q31.1 
region,  65  consecutive  SNPs  spanning  2.4  Mb  signaled 
a  deletion:  CN<1.2  and  P-values  <  10~5'5  (Fig.  2a). 
Similarly,  at  9q21.2,  a  set  of  24  consecutive  SNPs 
spanning  0.5  Mb  revealed  a  gain:  CN>2.8  and  P- 
values  <  10  4  (Fig.  2b).  The  deletion  and  gain  were 
confirmed  by  qPCR  (Fig.  2c,d).  These  data  suggest  that 
the  Affymetrix  100K  SNP  panel  is  able  to  detect  large- 
scale  deletions  and  insertions.  Furthermore,  these  data 
provide  an  empirical  basis  for  defining  the  criteria  that 
may  be  used  to  define  deletions  and  insertions. 

Because  of  the  potential  concern  of  in  vitro  chromo¬ 
somal  copy  number  changes  in  the  course  of  cell  culture 
of  LCLs,  we  attempted  to  confirm  the  above  deletions 
and  insertions  within  a  matched  blood  DNA  sample 
from  the  same  subject  using  the  Xba  chip  of  the  100K 
SNP  panel.  At  the  same  9q31.1  and  9q21.2  regions 
where  respective  deletions  and  gains  had  been 
observed  in  LCL  DNA,  no  evidence  for  a  deletion  was 
observed  in  the  blood  DNA  (Fig.  2e).  For  9q31.1,  the 
median  CN  of  these  65  SNPs  was  1.85,  with  a  range  of 
1.6-2.1.  There  were  12  heterozygous  among  these  65 
SNPs,  compared  with  no  heterozygous  when  these 
SNPs  were  assayed  in  LCL  DNA.  At  the  9q21.2  region. 


no  evidence  of  a  gain  was  observed  in  the  blood  DNA 
(Fig.  2f),  and  the  median  CN  was  1.9  with  a  range  of 
1. 7-2.0.  Consistent  with  these  results,  quantitative  real¬ 
time  PCR  assays  failed  to  detect  the  deletion  and 
insertion  at  these  regions  in  blood  DNA  (Fig.  2g,h). 
These  data  suggest  the  large-scale  deletion  and  inser¬ 
tion  observed  in  the  LCL  DNA  of  this  subject  were 
somatic  changes  that  occurred  in  the  course  of  cell 
culture.  Therefore,  it  appears  that  DNA  isolated  from 
blood  is  more  reliable  than  LCL  DNA  for  studies  of 
germline  CNPs. 

We  therefore  used  blood  DNA  for  the  remaining 
analyses.  Our  primary  goal  was  to  detect  large-scale 
CNPs  involving  >100  kb  in  the  genome  using  CNAT 
analyses  with  a  500-kb  window  size.  We  screened  for 
putative  deletions  using  the  working  criteria  of  a 
minimum  of  two  consecutive  SNPs  with  the  following 
characteristics:  CN  <  1.2,  P-value  <10-5'5,  a  separation 
of  >0,55  in  CN  between  groups  of  subjects,  and 
homozygous  genotypes.  We  screened  for  putative 
gains  using  the  working  criteria  of  a  minimum  of  two 
consecutive  SNPs  with  the  following  characteristics: 
CN  >  2.8,  P-value  <1CT4,  and  a  separation  of  >  0.55  in 
CN  between  groups  of  subjects.  With  these  criteria,  not;. 
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Fig.  1.  Genome-wide  copy  number  analysis  ofGeneChip  50KXbaarray  dataof005- 015  lymphoblastoid  cell  line  (LCL)  DNA  using  CN  AG.  Chr, 

chromosome.  Red  dots,  raw  log2  ratio  for  each  SNR  Green  line,  inferred  copy  number  from  a  hidden  Markov  model.  Blue  curves,  local  mean 
analysis  of  ten  consecutive  SNPs.  Green  bar.  heterozygous  SNPs.  Green  bar,  heterozygous  SNP  calls.  LOH  is  marked  under  chromosome  wich 
blue  lines. 
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Fig.  2.  DNA  copy  number  (CN)  changes  detected  by  I00K  SNP  mapping  panel  of  the  Affymetrix  GeneChip®.  A  deletion  at  9q3l.l  (a)  and  a 
gain  at  9q2i.2  (b)  were  observed  in  LCL  DNA  of  a  subject.  Quantitative  real-time  PCR  (qPCR)  confirmed  the  deletion  (c)  and  gain  (d)  in  this 
sample.  No  deletion  or  gain  was  found  in  a  blood  DNA  sample  from  the  same  individual,  as  measured  either  by  the  Gene  Chip  (e  and  f,  respec¬ 
tively),  orqPCR(gandh,  respectively).  In  panelsc,d,g,  and  h,theX-axis  shows  the  averaged  Ct  values  for  the  control  (GAPDH)ampliconandthe 
Y-axis  shows  the  test  amplicons  for  the  three  dilutions  of  each  tested  DNA  sample  (20, 10, 5  ng).The  Ct  values  for  control  and  test  amplicons  at 

each  dilution  of  DNA  were  plotted  against  each  other,  and  the  offset  between  be  st-fit  lines  along  the  test-amplicon  axis  indicates  CN  differences 

between  samples.  In  the  Gene  Chip  analysis,  a  CN  difference  of  one  between  samples  was  found  in  LCL  DNA  (c  and  d)  and  no  change  was  found  in 
blood  DNA  (e  and  f). 
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a  single  region  in  the  genome  met  the  criteria  for  a 
deletion  for  any  of  the  23  subjects  analyzed.  However, 
we  found  four  regions  that  met  the  criteria  for  gains, 
one  of  which  is  within  a  known  replicon  that  is  mapped 
to  several  chromosomes.  For  the  remaining  three 
regions,  we  performed  qPCR  and  confirmed  two  of 
these  gains  (Table  I).  One  confirmed  gain  involved  two 
SNPs  spanning  32,790  bp  at  lOqll  (46,465,579- 
46,498,369  bp)  and  was  found  among  multiple  subjects 
in  all  three  families.  This  CNP  is  also  within  a  known 
replicon  that  is  mapped  to  a  single  chromosomal 
region,  and  has  been  previously  described  [1,2].  The 
other  confirmed  gain  was  found  in  multiple  subjects 
from  a  single  family  and  involved  four  SNPs  spanning 
9,095  bp  at  19ql3  (57,024,188-57,033,283  bp).  This  is  a 
novel  CNP  and  is  not  within  a  known  duplicon.  The 
remaining  region  that  was  not  confirmed  by  qPCR  was 
found  in  a  single  subject.  These  results  suggest  that 
large-scale  germline  CNPs  involving  several  hundred 
kbs  can  be  detected  using  the  Affymetrix  100K  SNP 
panel.  However,  the  frequencies  of  large-scale  germ¬ 
line  CNPs  in  our  study  samples  are  not  as  common  as 
predicted  by  Sebat  et  al.  [1]  and  Iafrate  et  al.  [2]. 

While  the  500-kb  smooth  average  is  a  good  option  for 
detecting  large-scale  CNPs,  it  may  miss  smaller  CNPs 
because  altered  allele  intensities  of  implicated  SNPs 
may  be  averaged  out  by  the  allele  intensities  of  SNPs  in 
the  flanking  normal  region.  We  therefore  searched  the 
genome  for  small-  and  mid-scale  CNPs  by  re-analyzing 
the  data  using  smaller  window  sizes  (100-  and  10-kb). 
We  found  three  regions  that  met  the  working  criteria  of 
deletions  using  a  100-kb  window  size,  all  of  which  were 
confirmed  by  qPCR  (Table  I).  One  such  deletion 
involved  10  SNPs  spanning  145,676  bp  at  16q21.  Using 
a  10-kb  window  size,  we  found  nine  additional  regions 
that  met  the  working  criteria  of  deletions;  five  were 
confirmed  by  qPCR  (Table  I).  For  the  remaining  four 
regions,  mutations  were  found  in  either  probe  or 
restriction  enzyme  sequences,  which  likely  decreased 
the  intensity  of  hybridization  thereby  affecting  CN 
calculation.  We  also  found  18  regions  that  met  the 
working  criteria  of  gains  using  a  100-Kb  window  size, 
and  3  of  the  6  regions  selected  for  qPCR  assays  were 
confirmed  (Table  I).  In  addition,  50  additional  regions 
met  the  working  criteria  of  gains  using  a  10-Kb  window 
size,  and  only  1  of  the  6  regions  examined  by  qPCR 
assays  was  confirmed  (Table  I). 

Among  the  14  confirmed  CNPs  in  our  study,  13  were 
novel.  Two  of  six  gains  were  located  within  replicons, 
but  none  of  the  eight  deletions  was  within  a  replicon. 
Eight  CNPs,  including  four  deletions  and  four  gains 
involved  either  known  genes  or  predicted  genes.  The 
size  of  the  CNPs  ranged  from  341  to  145,676  bp. 

All  together,  we  observed  numerous  germline  CN 
changes  in  the  genome,  the  majority  of  which  were 
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small  (100s-l,000s  bp).  A  considerable  difference  of 
our  findings  compared  with  the  previous  studies  was 
that  CNPs  >100  kb  were  rare.  A  combination  of  factors 
may  account  for  the  difference.  First,  the  sample  sizes 
and  characteristics  of  study  subjects  differ.  The  Sebat 
study  [1]  included  20  unrelated  healthy  subjects,  and 
the  Iafrate  study  [2]  included  39  unrelated  healthy 
subjects  and  16  subjects  with  known  chromosomal 
imbalances.  Our  study  included  23  subjects  from  four 
prostate  cancer  families.  These  23  subjects  represent  8 
independent  founders.  Based  on  the  estimated  fre¬ 
quency  of  large-scale  CNPs  in  the  genome  [1],  though, 
we  would  expect  to  observe  77  CNPs  of  --222  kb  in  our 
study.  Second,  the  genome  coverage  of  the  SNPs  and 
probes  differs.  However,  the  100K  SNP  panel  provides 
similar  coverage  of  the  genome  and  of  previously 
reported  CNPs  regions.  For  example,  among  the  76 
unique  CNPs  identified  by  Sebat  [1],  51  CNPs  are 
covered  by  at  least  two  SNPs,  and  the  majority  are 
covered  by  >10  SNPs.  Third,  the  source  of  genomic 
DNA  may  affect  the  frequency  of  CNPs.  Sebat  et  al.  [1] 
examined  genomic  DNA  from  LCLs,  Iafrate  et  al.  [2] 
examined  DNA  from  both  LCLs  and  blood,  and  we 
examined  DNA  from  blood.  Artifact  CNPs  may  arise  in 
cell  lines  during  the  steps  of  generation,  manipulation, 
and  culturing.  This  possibility  was  demonstrated  by 
the  comparison  of  CNPs  between  the  matched  DNA 
samples  isolated  from  blood  and  LCL  of  the  same 
individual  (005-015)  in  our  study.  It  is  unclear  whether 
the  different  results  of  LCL  and  blood  DNA  are  specific 
to  this  cell  line.  The  potential  for  artifacts  that  may 
appear  to  be  CNPs  should  be  carefully  considered 
whenever  cell  lines  are  assayed  for  the  presence  of 
CNPs.  Fourth,  the  choice  of  reference  samples  differ. 
Our  use  of  a  large  number  of  subjects  (N  =  110)  as  a 
reference  group  may  lead  to  more  stable  results 
compared  to  a  single  subject  reference  in  both  ROMA 
and  BAC  array-CGH.  Finally,  our  utilization  of 
genotype  information  (e.g.,  heterozygous  results),  and 
our  efforts  to  confirm  many  of  the  putative  CNPs  by 
qPCR  and  direct  sequencing  in  this  study  may  guard 
against  false  positive  deletions. 

The  frequencies  of  copy  number  gains  and  losses  are 
dependent  on  the  degree  of  genetic  diversity  within  the 
population  used  for  the  study  and  on  the  resolution  of 
the  microarray  platform.  The  genetic  diversity  within 
our  study  was  limited  because  the  study  population 
was  composed  of  23  Caucasian  subjects  from  4  families. 
The  100K  SNPs  microarray  used  in  this  study  has  better 
resolution  than  most  of  the  previous  studies;  however, 
as  the  resolution  of  microarrays  increase  in  future 
studies,  the  frequency  of  identifiable  CNPs  in  humans 
will  certainly  increase.  Therefore,  it  is  premature  to 
estimate  the  frequency  of  CNPs  in  the  human  genome 
based  upon  this  study.  It  is  more  appropriate  to 


summarize  by  stating  that  we  have  demonstrated 
smaller  CNPs  were  much  more  frequent  than  large 
CNPs.  The  results  of  our  study  can  best  contribute  to 
a  more  accurate  estimate  of  CNPs  in  the  human 
genome  as  our  methods  are  applied  to  additional 
study  populations,  using  microarrays  of  increasing 
resolution. 

There  are  several  reasons  that  many  of  the  CNPs 
detected  by  SNP  arrays  could  not  be  confirmed  by 
qPCR.  First,  the  algorithm  used  by  the  software  to 
estimate  DNA  copy  number  affects  the  accuracy  of  the 
copy  number  calculation,  which  is  beyond  the  scope  of 
this  work.  Second,  the  conditions  of  PCR-based 
preparation  of  DNA  for  microarray  hybridization 
might  randomly  cause  artificial  imbalances  in  different 
sequences  of  the  genome.  Third,  mutations  in  the 
sequences  of  the  SNP  probes  and  restriction  enzymes 
are  the  most  obvious  factors  that  may  affect  the 
intensity  of  hybridization,  thereby  affecting  the  accu¬ 
racy  of  the  copy  number  calculation.  Overall,  this 
means  all  potential  CNPs  should  be  verified  by 
additional  molecular  approaches  before  they  are 
considered  confirmed. 

While  we  were  preparing  our  manuscript,  three 
additional  studies  examining  germline  deletions 
based  on  SNP  information  in  the  human  population 
were  published  [9-11].  In  addition  to  identifying  a 
large  number  of  germline  deletions  in  the  genome, 
their  results  also  suggest  that  the  germline  deletions, 
with  median  lengths  ranging  from  500  bp  to  10.5  kb, 
are  smaller  than  the  studies  of  Sebat  [1],  Iafrate  [2], 
and  Tuzun  [3].  Although  the  conclusions  regarding 
germline  deletions  are  similar  between  these  three 
SNP-based  studies  and  our  study,  there  are  at  least 
two  major  differences.  One  is  the  methods  used  to 
identify  CNPs;  the  studies  by  Conrad  [9]  and  McCar- 
roll  [11]  relied  on  family-based  methods,  which  are 
not  applicable  to  many  case-control  association 
studies.  In  contrast,  our  method  utilized  the  allele 
intensity  data  generated  from  SNP  genotyping  and 
can  be  readily  applied  to  both  family-based  and  case- 
control  studies.  This  is  an  important  issue  because 
Affymetrix  SNP  genotyping  intensity  data  is  cur¬ 
rently  being  generated  for  qualitative  genotyping  as 
part  of  several  genome-wide  SNP  association  studies 
currently  underway.  Another  difference  is  that  our 
method  can  identify  both  germline  deletions  and 
gains,  while  the  other  three  recent  studies  focused 
only  on  germline  deletions.  In  fact,  we  found  that 
there  are  more  gains  than  deletions  in  our  study 
population.  The  unique  ability  of  our  method  to 
examine  both  deletions  and  gains  allowed  us  to  fully 
test  the  claims  of  the  two  initial  reports  [1,2]  that  large- 
scale  germline  CNPs  involving  >100  kb  are  common 
in  the  genome. 


The  Prostate  DOI  1 0. 1 002/pros 


Germiine  DNACopy  Number  Polymorphisms  233 


CONCLUSIONS 

We  have  shown  that  by  combining  data  on  the  allele 
intensity  and  genotype  of  SNPs,  we  can  detect  CNPs 
in  the  genome.  The  fact  that  a  substantial  number  of 
detected  CN  changes,  especially  deletions,  can  be 
confirmed  by  qPCR  and  direct  sequencing,  as  well  as 
the  fact  that  the  same  CNPs  were  detected  in  multiple 
individuals  within  a  family,  validates  this  method.  If 
CNPs  of  >100  kb  are  frequent  in  the  genome,  this 
method  should  be  able  to  detect  at  least  a  large  subset  of 
these  large-scale  CNPs.  On  the  other  hand,  we 
recognize  that  our  study  may  miss  a  substantial 
number  of  smal  ler  CNPs  due  to  limited  SNP  resolution. 
This  potential  limitation  can  be  overcome  by  using 
the  higher  resolution  500K  SNP  panel  from  Affymetrix. 

Based  on  the  observed  sizes  and  frequencies  of 
CNPs,  a  major  implication  of  our  study  is  that  future 
examinations  of  germline  CNPs  should  focus  on 
smaller-scale  deletions  and  gains.  There  is  a  growing 
list  of  genes  that  seem  to  intersect  with  the  sites  of 
deletion.  Conrad  and  colleagues  [9],  for  example, 
identified  92  genes  that  were  entirely  deleted  and 
another  109  genes  in  which  coding  sequences  were 
partially  eliminated.  These  types  of  CNPs,  together 
with  SNPs  which  are  present  at  much  higher  frequen¬ 
cies  may  affect  the  function  and  expression  of  disease 
risk  and  modifier  genes.  The  SNP  mapping  panel  offers 
a  unique  possibility  to  simultaneously  assess  the  roles 
of  SNPs  and  CNPs  in  disease  risk  [12],  although  it  is 
premature  to  draw  a  conclusive  connection  between 
the  frequency  of  confirmed  germline  CNPs  and  cancer 
risk  based  on  the  data  from  these  23  subjects. 
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Deletion  of  a  Small  Consensus  Region  at  6q15,  Including  the 
MAP3K7  Gene,  Is  Significantly  Associated  with 
High-Grade  Prostate  Cancers 

Wennuan  Liu,1  Bao-Li  Chang,1  Scott  Cramer,2  Patrick  P.  Koty,3  Tao  Li,1  Jishan  Sun,1  Aubrey  R.  Turner,1 
Chris  Von  Kap-Herr,3  Peggy  Bobby,3  Jianyu  Rao,4  S.  Lilly  Zheng,1  William  B.  Isaacs,5  and  Jianfeng  Xu1 


Abstract  Purpose:  Chromosome  6q14-21  is  commonly  deleted  in  prostate  cancers,  occurring  in  ~  22%  of 
all  tumors  and  ~  40%  of  metastatic  tumors.  However,  candidate  prostate  tumor  suppressor  genes 
in  this  region  have  not  been  identified,  in  part  due  to  the  large  and  broad  nature  of  the  deleted 
region  implicated  in  previous  studies. 

Experimental  Design:  We  first  used  high-resolution  Affymetrix  single  nucleotide  polymorphism 
arrays  to  examine  DNA  from  malignant  and  matched  nonmalignant  cells  from  55  prostate  cancer 
patients.  We  identified  a  small  consensus  region  on  6q14-21  and  evaluated  the  deletion  status 
within  the  region  among  additional  40  tumors  and  normal  pairs  using  quantitative  PCR  and  fluo¬ 
rescence  in  situ  hybridization.  We  finally  tested  the  association  between  the  deletion  and  Gleason 
score  using  the  Fisher's  exact  test. 

Results: Tumors  with  small,  interstitial  deletions  at  6q14-21  defined  an  817-kb  consensus  region 
that  is  affected  in  20  of  21  tumors.  The  MAP3K7  gene  is  one  of  five  genes  located  in  this  region. 
In  total,  MAP3K7  was  deleted  in  32%  of  95  tumors.  Importantly,  deletion  of  MAP3K7  was 
highly  associated  with  higher-grade  disease,  occurring  in  61%  of  tumors  with  Gleason  score 
>8  compared  with  only  22%  of  tumors  with  Gleason  score  <7.  The  difference  was  highly 
significant  (P  -  0.001). 

Conclusion:  Our  study  provides  strong  evidence  for  the  first  time  that  a  small  deletion  at  6q15, 
including  the  MAP3K7  gene  and  four  other  genes,  is  associated  with  high-grade  prostate 
cancers.  Although  the  deletion  may  be  a  marker  for  high-grade  prostate  cancer,  additional  studies 
».  are  needed  to  understand  its  molecular  mechanisms. 


Many  if  not  most  prostate  cancers  do  not  pose  a  major 
health  threat  to  their  hosts.  The  molecular  factors  responsible 
for  variations  in  the  aggressiveness  of  prostate  cancers  are 
poorly  defined.  Deletion  of  DNA  sequences  from  chromo¬ 
some  6ql4-21  is  one  of  the  most  common  deletion  events  in 
the  genome  of  prostate  tumors  (reviewed  in  ref.  1).  In  a  recent 
study  that  estimated  the  frequency  of  DNA  copy  number 
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alterations  in  the  prostate  cancer  genome  based  on  all 
published  comparative  genomic  hybridization  studies  of 
prostate  cancers,  about  one  quarter  of  the  891  prostate 
cancers  had  a  deletion  at  6ql4-21  (2).  More  importantly, 
the  deletion  seemed  to  be  more  common  in  metastatic/ 
advanced  tumors  (40%)  than  in  localized/primary  tumors 
(19%).  Despite  this  overwhelming  evidence  for  frequent  6q 
deletions,  specific  prostate  tumor  suppressor  genes  have  not 
been  identified  in  this  region.  One  of  the  major  obstacles  is 
the  size  of  the  deleted  region  implicated  in  previous  studies, 
due  at  least  in  part  to  limited  resolution  of  the  detection 
methods.  In  our  combined  analysis  of  all  published  compar¬ 
ative  genomic  hybridization  studies,  the  deleted  region  at  6q 
spans  —30  Mb  (2).  The  large  number  of  genes  (>170)  located 
within  the  broad  deletion  interval  poses  a  significant  challenge 
to  effective  searches  for  tumor  suppressor  genes  in  the  region, 
Therefore,  efforts  are  needed  to  define  a  smaller  candidate 
region.  Higher-resolution  detection  methods,  such  as  repre¬ 
sentational  oligonucleotide  microarray  analysis  and  single 
nucleotide  polymorphism  (SNP)  arrays,  may  be  helpful  by 
identifying  smaller  deletions  and  delineating  detailed  deletion 
patterns,  such  as  interstitial  deletions  (3,  4). 

In  this  study,  we  used  high-resolution  Affymetrix  SNP  arrays, 
fluorescence  in  situ  hybridization  (FISH),  and  quantitative 
PCRs  (qPCR)  to  examine  deletion  patterns  at  6q  among  95 
prostate  tumors.  Our  goal  was  to  identify  a  small  consensus 
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region,  evaluate  candidate  genes  within  the  region  using 
various  molecular  methods,  and  examine  the  possible  correla¬ 
tion  between  gene  deletion  and  tumor  aggressiveness. 

Materials  and  Methods 

Subjects.  All  tumors  analyzed  in  the  study  were  from  independent 
prostate  cancer  patients  undergoing  radical  prostatectomy  for  treatment 
of  clinically  localized  disease  at  lohns  Hopkins  Hospital.  We  selected  a 
subset  of  subjects  from  whom  genomic  DNA  of  sufficient  quantity  (>5 
Mg)  and  purity  {>70%  cancer  cells  for  cancer  specimens,  no  detectable 
cancer  cells  for  normal  samples)  could  be  obtained  by  dissection  of 
matched  nonmalignant  {hereafter  referred  to  as  "normal")  and  cancer 
containing  areas  of  prostate  tissue  as  determined  by  histologic 
evaluation  of  H&E-stained  frozen  sections  of  snap-frozen  radical 
prostatectomy  specimens.  Genomic  DNA  was  isolated  from  trimmed 
frozen  tissues  as  described  previously  (5). 

Detection  of  6(j  deletions  using  Affymetrix  SNP  arrays.  We  first  used 
Affymetrix  SNP  array  panels  to  detea  DNA  copy  number  alterations  in 
the  genome  among  55  prostate  cancers,  and  for  this  study,  we  focused 
entirely  on  6q  deletions.  We  used  the  100K  SNP  array  for  the  first  22 
subjects  (4)  and  then  used  the  500K  SNP  array  (Affymetrix,  Inc.)  for  the 
remaining  41  subjects.  Eight  of  the  subjects  were  analyzed  using  both 
100K  and  5Q0K  SNP  arrays.  All  of  the  reagents  used  for  the  assay  were 
obtained  from  the  manufacturers  recommended  by  Affymetrix.  We 
digested,  ligated,  amplified,  fragmented,  and  labeled  the  samples  and 
hybridized,  washed,  and  stained  the  SNP  mapping  arrays  according  to 
the  manufacturer's  instructions.  DNA  copy  number  was  calculated 
based  on  allele  intensity  using  two  different  software  packages:  Copy 
Number  Analyzer  for  Affymetrix  GeneChip  (CNAG2.0;  ref.  6)  and 
dChip  analyzer  (7).  Allele-specific  analysis  was  also  done  to  estimate 
DNA  copy  number  for  each  chromosome  pair  using  CNAG2.0.  The 
results  from  all  three  types  of  analyses  using  CNAG2.0  and  dChip  were 
consistent,  and  we  show  the  output  from  CNAG2.0  in  Fig.  1.  The 
physical  positions  of  detected  deletions  were  determined  based  on  the 
Human  hgl7  Assembly  (National  Center  for  Biotechnology  Informa¬ 
tion  Build  35). 

Detection  of  DNA  deletion  at  MAP3K7  using  ijl’CR.  qPCR  analysis 
was  used  to  evaluate  the  deletion  status  of  the  MAP3K7  gene  in  the  55 
tumors  described  above  and  among  40  additional  prostate  tumors  that 
were  not  evaluated  by  SNP  arrays.  The  qPCR  analysis  was  done  using 
the  ABI  Prism  7000  Sequence  Detection  System  as  described  in  detail 
elsewhere  (4).  A  primer  set  located  in  the  last  exon  of  MAP3K7  was  used 
to  amplify  the  test  amplicon  (forward  primer:  5'-AACGGTCCCAGA- 
CAATCATCAACTGC-3';  reverse  primer:  5'-GAG GTCATCAGAACTCAG- 
CAGCAGAA-3'),  and  a  primer  set  located  around  the  junction  of  intron 
2  and  exon  3  of  GAPDH  was  used  to  amplify  the  control  amplicon 
(forward  primer:  5'-TCCTCATGCCnTCTTGCCTCrTGT-3';  reverse  prim¬ 
er:  5'-AGGCGCCCAATACGACCAAATCTA-3'). 

Statistical  analysis.  The  difference  in  the  frequencies  of  DNA 
deletion  at  MAP3K7  between  lower-grade  (Gleason  scores  <7)  and 
higher-grade  tumors  (Gleason  scores  >8}  was  tested  using  Fisher's  exact 
test. 

Confirmation  of  DNA  deletion  at  MAP3K7  using  FISH.  A  subset 
of  identified  MAP3K7  deleted  tumors  was  analyzed  using  FISH  to 
confirm  their  deletion  status.  We  obtained  the  PAC  clone  RP1-154GI4 
(—100  kb  at  6ql5-16.3)  for  the  FISH  analysis  from  the  Children's 
Hospital  Oakland  Research  Institute.6  The  only  known  gene  contained 
in  this  clone  is  MAP3K7.  We  grew  the  clone,  isolated  the  DNA,  and 
checked  the  identity  of  the  insert  in  the  clone  as  recommended.  The 
hybridization  mixture  contained  200  ng  of  PAC  RP1-154G14  DNA, 
which  was  labeled  by  nick  translation  with  SpecirumOrange  dUTP 
following  the  manufacturer's  protocol  (Vysis,  Inc.).  In  addition,  the 
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hybridization  mixture  contained  two  additional  probes:  the  centro- 
meric  probe  CEP6  labeled  with  SpearumGreen  (Vysis)  and  the  LSI 
MYB  probe  labeled  with  SpectrumAqua  (Vysis),  which  bracket  the  PAC 
RP1-154G14  clone  (Fig.  2B).  Paraffin-embedded  normal  prostate  and 
prostate  cancer  sections  (5  pm)  were  pretreated,  hybridized,  and 
washed  following  the  manufacturer's  protocol  (Vysis).  FISH  analysis 
was  done  using  fluorescent  microscopy  with  the  appropriate  filters  to 
visualize  the  three  probes.  Slides  were  blinded,  and  then  for  each 
sample,  a  total  of  200  interphase  cells  was  analyzed  independently  by 
two  individuals. 

MAP3K7  protein  expression  using  immunohistochemistry.  For  immu- 
nohistochemical  staining  of  proteins,  sections  were  deparaffinized  by 
successive  incubation  in  xylene,  100%  ethanol,  and  90%  ethanol 
following  standard  procedures.  The  endogenous  peroxidase  activity  was 
blocked  by  incubation  for  20  min  at  room  temperature  in  0.5%  H202 
in  water.  Sections  were  washed  thrice  in  PBS.  Retrieval  of  antigens  was 
done  by  incubating  the  sections  in  antigen  retrieval  solution  (Sigma) 
for  1  h  in  a  95°C  water  bath.  After  allowing  the  sections  to  cool, 
samples  were  washed  in  PBS  and  blocked  with  3%  bovine  serum 
albumin  in  PBS  for  30  min  at  room  temperature.  After  blocking, 
seaions  were  incubated  with  anti-transforming  growth  factor-(S~ 
activated  kinase  I  (Takl)  primary  antibody  (1:50  dilution;  Abeam, 
inc.)  for  1  h  at  room  temperature  and  washed.  Sections  were  then 
incubated  with  the  goat  anti-rabbit  secondary  antibody  with  a 
horseradish  peroxidase  conjugate  (Jackson  ImmunoResearch  Laborato¬ 
ries,  Inc.),  washed,  and  then  incubated  with  diami nobenzidine  for 
approximately  2  to  5  min.  Following  counterstain  with  hematoxylin 
(Sigma)  and  clearing  of  the  sections  through  ethanol  and  xylene, 
coverslips  were  mounted  using  Permount  medium. 

Results  and  Discussion 

The  use  of  high-resolution  SNP  arrays  and  allele-specific 
analyses  improved  our  ability  to  detect  somatic  DNA  deletions 
in  prostate  tumors.  Detectable  deletions  at  6ql4-21  were 
observed  in  21  of  the  55  prostate  cancers  (38%)  examined 
using  the  SNP  arrays  in  this  study  (Fig.  1).  This  frequency  was 
considerably  higher  than  the  estimate  of  25%  obtained  from  a 
combined  analysis  of  891  prostate  tumors  described  in 
published  comparative  genomic  hybridization  studies  (2). 
One  reason  for  the  higher  estimate  of  deletion  in  our  study  is 
the  use  of  a  higher-resolution  detection  method.  For  example, 
we  were  able  to  detect  a  small  deletion  of  —800  kb  at  6ql5  in 
tumor  G7-042  using  the  SNP  arrays  (Fig.  1).  Another  potential 
factor  for  the  higher  estimate  of  deletion  was  the  greater 
proportion  of  tumors  with  high-grade  disease  (38%  with 
Gleason  score  >8)  in  our  study.  We  found  that  the  frequency 
of  6q  deletion  was  significantly  higher  in  tumors  with  Gleason 
score  >8  (12  of  17  tumors,  71%)  than  that  of  Gleason  scores 
<7  (9  of  38,  24%;  P  =  0.002).  The  association  between  6q 
deletion  and  tumor  grade  was  striking;  in  fact,  it  was  the 
strongest  among  all  the  association  tests  between  any  common 
recurrent  DNA  copy  number  changes  (defined  as  >10%) 
identified  in  our  studies  and  tumor  grade. 

The  high-resolution  SNP  arrays  and  the  ability  to  analyze 
copy  number  changes  for  each  chromosome  using  allele- 
specific  analysis  also  improved  our  ability  to  dissect  detailed 
deletion  patterns.  We  detected  three  small-size  homozygous 
deletions  among  these  21  tumors  with  6q  deletions  (Fig.  1). 
One  was  -838  kb  (98,588-99,426  kb)  in  tumor  G7-019  and 
contained  the  POU3P2  gene,  one  was  -265  kb  (117,394- 
117,659  kb)  in  tumor  G7-048  and  contained  no  known 
gene,  and  the  other  was  -1.1  Mb  (84,868-85,963  kb)  in 
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Fig.1.  Examples  of  genomic  deletions  on  chromosome  6q  identified  in  tumor  genomes  using  Affymetrix  100  K  (a  and  A)  and  500K  (c-v)  SNP  mapping  arrays,  and  CNAG2.0 
with  genomic  smoothing  of  10  SNPs.  a  to  s,  allele-specific  analysis  using  matched  normal  DNA  as  reference,  t  to  v,  non  -  allele-specific  analysis  using  automatically  selected 
references  by  CIM AG  2.0  from  our  database  containing  40  normal  samples,  d,  from  Nsp  array  only,  i,  from  Sty  array  only.  Hemizygous  deletions  outlined  by  green  (allele-specific 
analysis)  or  blue  (non  -  allele-specific  analysis)  horizontal  curves  below  the  baseline  {solid  black  line)  are  marked  by  green  arrows.  Homozygous  deletions  outlined  by  red 
and  green  or  blue  horizontal  curves  below  the  baseline  {solid  black  line )  are  marked  by  red  arrows.  Vertical  dotted  lines,  minimum  overlapping  deleted  region,  w,  genes  in  the 
minimum  overlapping  deleted  region  based  on  the  Human  hg17  Assembly  (University  of  California  at  Santa  Cruz.  National  Center  for  Biotechnology  Information  Build  35). 


tumor  G7-030  and  contained  the  Q9Y2L2  gene.  These 
homozygous  deletions,  however,  did  not  overlap.  Although 
the  significance  of  these  homozygous  deletions  in  prostate 
cancer  development  is  unclear,  their  nonoverlapping  nature 
may  be  more  consistent  with  generalized  genomic  instability 


than  a  specific  selection  for  loss  of  these  particular  genomic 
regions. 

In  addition,  we  found  two  additional  tumors  (tumors  G7- 
026  and  G7-02S)  with  interstitial  deletions  at  6q  (Fig.  1).  The 
distal  deleted  region  of  tumor  G7-026,  between  90,451  and 
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97,338  kb,  provided  information  that  was  critical  in  defining  a 
minimal  overlapping  region  (see  below).  When  examining  the 
deleted  regions  shared  by  each  of  these  21  tumors,  we  found 
one  small  region,  between  90,493  and  91,310  kb,  which  was 
shared  by  all  but  one  tumor.  Interestingly,  this  shared  deleted 
region  fell  within  the  4.3  Mb  6q  deleted  region  observed  in 
LNCaP  and  seemed  to  be  consistent  with  the  minimum  loss  of 
heterozygosity  regions  reported  by  Cooney  et  al.  (8),  Srikantan 
et  al.  (9),  Hyytinen  et  al.  (10),  and  Konishi  et  al.  (11).  Only  five 
genes  are  located  within  this  region  of  817  kb,  including 


Fig.  2.  Confirmation  of  MAP3K7  deletions  in  various  tumors  using  qPCR  and 
FISH.  A  examples  of  validating  MAP3K7  deletions  by  real-time  qPCfl.  Solid  lines, 
controls;  dotted  lines,  tumors;  blue,  subject  G8-005;  red,  subject  G9-003.  B, 
ideogram  of  chromosome  6  showing  the  physical  map  location  of  the  FISH  probes. 
Green,  CEP6  probe;  orange,  PAC  RP1-154G14  clone;  aqua,  LSI  MYB.  C,  normal 
prostate  control  with  a  normal  signal  pattern  (2G202A).  D,  tumor  G8-005  with  a 
signal  pattern  indicating  a  hemizygous  deletion  of  the  PAC  RP1-154G14  done 
(2G102A). 


MDN1,  CASP8AP2,  CX62,  BACH2,  and  MAP3K7.  To  further 
evaluate  the  significance  of  these  five  genes,  we  examined  their 
differential  expression  patterns  in  the  ONCOMINE  gene 
expression  database.7  The  expression  of  the  MAP3K7  gene 
was  significantly  lower  in  the  tumor  cells  in  comparison  with 
the  normal  tissues  of  the  prostate  (P  =  0.0001).  In  contrast,  the 
expressions  of  the  other  four  genes  in  prostate  tumors  were  not 
significantly  different  from  normal.  The  MAP3K7  gene  encodes 
Takl,  a  member  of  the  mitogen-activated  protein  kinase  kinase 
kinase  family  and  an  activator  of  c-Jun  NH2-terminal  kinase 
and  p38  mitogen-activated  protein  kinase  pathways  (12). 
Because  of  the  decreased  expression  of  MAP3K7  in  prostate 
cancer  and  its  potential  tumor  suppressor  role,  we  targeted  the 
MAP3K7  gene  in  the  remaining  analyses. 

To  confirm  that  the  MAP3K7  gene  was  implicated  in  these 
6ql4-21  deleted  prostate  tumors,  we  did  a  qPCR  analysis  using 
a  probe  located  in  the  last  exon  of  the  MAP3K7  gene.  As 
expected,  qPCR  analysis  detected  a  hemizygous  deletion  in  the 
20  tumors  that  shared  the  817-kb  minimal  overlapped  deleted 
region  (Fig.  2A)  and  also  confirmed  that  the  remaining  tumor 
(tumor  G7-028)  does  not  have  a  deletion  at  this  region.  To 
further  confirm  the  SNP  array  and  qPCR  results,  we  did  FISH 
analysis  in  a  subset  of  samples  (Fig.  2B-D).  The  RP1-154G14 
(~  100  kb  and  includes  the  A 4AP3K7  gene),  CEPS,  and  LSI  MYB 
probes  were  hybridized  to  tumors  G8-005,  G9-003,  and  G7- 
019,  and  the  probe  signal  was  analyzed  in  200  interphase  cells 
for  each  sample  (Fig.  2C  and  D).  This  analysis  revealed  a 
pattern  consistent  with  a  hemizygous  interstitial  deletion  of  the 
MAP3K7  region  of  chromosome  6ql5-ql6.3  (2G102A  signal 
pattern,  as  shown  in  Fig.  2D)  in  tumor  G8-005  (54%),  tumor 
G9-003  (82%),  and  tumor  G7-019  (55%),  whereas  the  normal 
prostate  control  revealed  a  normal  signal  pattern  (2G202A;  Fig. 
2C)  in  94%  of  the  cells.  This  analysis  confirmed  the  SNP  array 
and  qPCR  results,  which  also  identified  a  hemizygous 
interstitial  deletion  of  the  MAP3K7  region  of  chromosome 
6ql5-ql6.3. 

To  obtain  a  better  estimate  of  MAP3K7  deletion  frequency 
in  prostate  tumors  and  assess  their  association  with  Gleason 
scores,  we  used  the  same  qPCR  assay  to  examine  the  deletion 
status  in  40  additional  tumors  that  have  not  been  analyzed  by 
SNP  arrays.  The  vast  majority  of  these  tumors  had  a  Gleason 
score  of  6  or  7.  We  observed  the  MAP3K7  gene  deletion  in 
nine  of  these  tumors.  In  total,  the  MAP3K7  gene  deletion  was 
detected  in  30  of  the  95  (32%)  tumors  in  our  study  using 
either  SNP  array  or  qPCR  method.  Importantly,  the  deletion 
was  considerably  more  common  in  higher-grade  tumors 
(14  of  23  tumors  with  Gleason  score  >8,  61%)  than  in 
lower-grade  tumors  (16  of  72  tumors  with  Gleason  score  <7, 
22%;  Table  1).  The  difference  was  highly  significant 
(P  =  0.001).  Most  strikingly,  the  frequency  of  the  MAP3K7 
deletion  was  highest  in  Gleason  9  tumors,  occurring  in  9  of 
the  12  tumors  (75%). 

As  a  test  of  our  genetic  data,  we  immunostained  represen¬ 
tative  prostate  tissue  samples  with  anti-Tak  (alias  MAP3K7 ) 
antibody.  Figure  3  shows  immunostaining  of  benign  versus 
high-grade  tumor  (Gleason  9)  in  a  sample  (tumor  G9-001) 
with  confirmed  hemizygous  deletion  of  MAP3K7.  Benign  tissue 
shows  staining  in  the  cytoplasm  and  at  the  plasma  membrane 
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Table  1.  MAP3K7  deletion  in  prostate  tumors 


Group 

Gleason 

sum 

No.  tumors 
examined 

MAP3K7  deletion 

No. 

tumors 

% 

tumors 

Lower-grade 

6 

19 

4 

21 

tumor 

3  +  4 

36 

9 

25 

4  +  3 

17 

3 

18 

Subtotal 

72 

16 

22 

Higher-grade 

8 

11 

5 

45 

tumor 

9 

12 

9 

75 

Subtotal 

23 

14 

61 

All  tumors 

95 

30 

32 

of  prostatic  epithelial  cells.  The  intensity  of  staining  ranges 
from  moderate  (data  not  shown)  to  intense  (Fig.  3A).  There  are 
also  moderate  to  strongly  staining  stromal  cells  present.  In  the 
sample  depicted  in  Fig.  3,  the  immunoreactivity  present  in 
high-grade  tumor  is  greatly  diminished  (Fig.  3B),  consistent 
with  the  hemizygous  genotype  of  this  sample.  Also  note  the 
strong  immunoreactivity  of  stromal  cells  in  the  panel  of  the 
high-grade  specimen.  Two  additional  samples  with  verified 
hemizygous  genotype  showed  similar  Takl  immunoreactivity 
(data  not  shown).  These  results  are  consistent  with  the  findings 
that  MAP3K7  RNA  expression  level  is  greatly  reduced  in 
prostate  cancers  in  comparison  with  the  level  in  normal 
prostate  tissues.7 

Our  finding  that  a  small  consensus  deleted  region  in 
prostate  cancers,  including  the  MAP3K7  gene,  is  strongly 
associated  with  Gleason  score  is  significant.  The  MAP3K7  gene 
encodes  Takl.  Takl  is  a  member  of  the  mitogen-activated 
protein  kinase  kinase  kinase  family  that  was  originally 
identified  as  a  key  regulator  of  mitogen-activated  protein 
kinase  activation  in  transforming  growth  factor-(3- induced 
signaling  pathways  (12).  Mutations  and  alterations  in  several 
layers  of  the  transforming  growth  factor- p  signaling  cascade 
have  been  identified,  including  ligand,  receptors,  and  intra¬ 
cellular  signaling  events  (13).  Before  our  genetic  study,  Takl 
has  not  been  implicated  in  this  process.  Our  demonstration 
that  the  MAP3K7  locus  is  deleted  in  prostate  tumor  specimens 
adds  another  step  in  transforming  growth  factor- [3  signaling 
that  is  abrogated  in  prostate  tumorigenesis  and  further 
strengthens  the  role  of  this  pathway  in  prostate  cancer 
development. 

Although  Gleason  score  is  perhaps  the  most  reliable 
predictor  for  prostate  cancer  behavior,  it  is  far  from  ideal. 
Many  low-  to  intermediate-grade  cancers  may  be  metastatic, 
whereas  a  few  of  the  high-grade  tumors  still  have  an  indolent 
course.  Thus,  molecular  markers  that  could  further  define  the 
disease  prognosis  are  greatly  needed.  Markers  that  can  be  used, 
either  alone  or  in  conjunction  with  Gleason  score,  to  more 
precisely  identify  prostate  cancers  capable  of  progression  to 
disseminated  disease  would  be  extremely  useful  in  determining 
which  patients  to  treat  and  how  aggressively.  It  is  interesting  to 
note  that  the  association  between  MAP3K7  deletion  and 
Gleason  grade  is  stronger  than  that  of  between  p53  and 
Gleason  grade  in  our  study.  Deletion  of  the  p53  region  at 
17pl3  occurs  in  52.94%  of  tumors  with  Gleason  score  >8  and 
in  16.79%  of  tumors  with  Gleason  score  <7  (P  =  0.008).  In 
comparison,  the  deletion  at  MAP3K7  region  occurs  in  61%  of 


tumors  with  Gleason  score  >8  and  only  in  22%  of  tumors  with 
Gleason  score  <7  (P  =  0.001).  Further  work  in  larger  patient 
populations  with  follow-up  information  will  be  needed  to 
assess  the  potential  of  MAR3K7  deletion  as  a  prognostic 
marker. 

Although  we  have  shown  the  significance  of  the  MAP3K7 
deletion  in  prostate  cancer,  these  results  are  subject  to  several 
caveats.  First,  our  data  only  provided  a  strong  statistical 
evidence  for  association  between  a  consensus  deleted  region, 
including  the  MAP3K7  gene,  and  tumor  grade;  we  did  not 
assess  the  causal  relationship  between  MAP3K7  gene  and 
aggressive  prostate  cancer.  Other  in  vitro,  in  vivo,  and  animal 
studies  are  needed  to  understand  its  molecular  mechanisms. 
Second,  the  current  study  targeted  only  the  MAP3K7  gene, 
and  therefore,  the  roles  of  four  other  genes  at  this  minimum 
overlap  region  in  prostate  cancer  development  remain  to  be 


Fig.  3.  Takl  immunoreactivity  is  decreased  in  high-grade  prostate  tumors.  Sections 
(5  pm)  were  stained  with  rabbit  anti-Takl  antibody  followed  by  labeling  protocols 
using  goat  anti-rabbit  horseradish  peroxidase  secondary  as  described  in  Materials 
and  Methods.  Controls  that  lacked  primary  antibody  were  clear  (data  not  shown). 
The  images  shown  were  from  different  areas  of  the  same  section  of  a  specimen 
verified  by  100K  SNP  and  q-PCR  to  be  hemizygous  for  the  MAP3K7  locus. 

A.  benign  region.  B,  Gleason  5  tumor  cells.  Note  benign  gland  in  lower-right  hand 
corner  of  (B)  showing  positive  Takl  immunoreactivity. 
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further  investigated.  Finally,  our  results  differ  from  a  previous 
study  that  found  no  correlation  between  loss  of  heterozy¬ 
gosity  at  6q  and  MAP3K7  transcript  expression  (11).  The 
differences  in  the  deletion  detection  methods  [loss  of 
heterozygosity  using  microsatellite  markers  in  the  study  of 
Konishi  et  al.  (11)  versus  SNP  arrays  and  qPCR  in  our  study] 
and  sample  size  [21  subjects  in  the  study  of  Konishi  et  al. 
(11)  versus  95  patients  in  our  study]  may  contribute  to  the 
contradictory  findings. 


In  summary,  our  study  provides  evidence  that  deletion  of  an 
817-kb  region  of  6ql5,  including  the  MAP3K7  gene,  is  one  of 
the  most  consistent  genetic  alterations  occurring  in  the  genome 
of  high-grade  prostate  cancers.  Although  MAP3K7  represents 
an  important  candidate  prostate  tumor  suppressor  gene 
affected  by  this  genomic  alteration,  further  studies  will  be 
necessary  to  establish  a  causal  relationship  between  this  gene 
and/or  the  other  genes  in  this  interval  and  prostate  cancer 
initiation  and/or  progression. 
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Multiple  Genomic  Alterations  on  2lq22  Predict 
Various  TMPRSS2/ERG  Fusion  Transcripts  in 
Human  Prostate  Cancers 


Wennuan  Liu,1  Charles  M.  Ewing,2  Bao-Li  Chang,1  Tao  Li,1  Jishan  Sun, 1  Aubrey  Ft.  Turner,1  Latchezar  Dimitrov,1 
Yi  Zhu,1  Jielin  Sun,1  Jin  Woo  Kim,1  S.  Lilly  Zheng,1  William  B.  Isaacs,2*  and  Jianfeng  Xu1* 

'Center  for  Human  Genomics,  Wake  Forest  University  School  of  Medicine,  Winston-Salem,  NC 
2Johns  Hopkins  Medical  Institutions,  Baltimore,  MD 

A  number  of  TMPRSS2IERG  fusion  transcripts  have  been  reported  since  the  discovery  that  recurrent  genomic  rearrangements 
result  in  the  fusion  of  TMPRSS2  and  ETS  family  member  genes.  In  this  article  we  present  evidence  demonstrating  that  multiple 
genomic  alterations  contribute  to  the  formation  of  various  TMPRSS2/ERG  transcripts.  Using  allele-specific  analysis  of  the  data 
generated  from  the  GeneChip  500K  SNP  array  we  observed  both  hemizygous  and  homozygous  deletions  occurring  at  differ¬ 
ent  locations  between  and  within  TMPRSS2  and  ERG  in  prostate  cancers.  The  SOOK  SNP  array  enabled  us  to  fine  map  the  start 
and  end  of  each  deletion  to  specific  introns  of  these  two  genes,  and  to  predict  a  variety  of  fusion  transcripts,  including  a  new 
form  which  was  confirmed  by  sequence  analysis  of  the  fusion  transcripts  in  various  tumors.  We  also  inferred  that  translocation 
is  an  additional  mechanism  of  fusion  for  these  two  genes  in  some  tumors,  based  on  largely  diploid  genomic  DNA  between 
TMPRSS  and  ERG,  and  different  fusion  transcripts  produced  in  these  tumors.  Using  a  bioinformatics  approach,  we  then  uncov¬ 
ered  the  consensus  sequences  in  the  regions  harboring  che  breakpoints  of  the  deletions.  These  consensus  sequences  were  ho¬ 
mologous  to  the  human  Alu-Sq  and  Alu-Sp  subfamily  consensus  sequences,  with  more  than  80%  homology.  The  presence/ab¬ 
sence  of  Alu  family  consensus  sequence  in  the  introns  of  TMPRSS2  and  ERG  correlates  with  the  presence/absence  of  fusion 
transcripts  of  theses  two  genes,  indicating  that  these  consensus  sequences  may  contribute  to  genomic  deletions  and  the  fusion 
of  TMPRSS2  and  ERG  in  prostate  cancer.  ©  2007  Wiley-Liss,  Inc. 


INTRODUCTION 

The  recent  discovery  of  recurrent  chromosomal 
rearrangements  resulting  in  the  fusion  of  androgen 
regulated  TMPRSS2  to  transcripts  of  ETS  family 
member  genes  in  the  majority  of  prostate  cancers 
has  not  only  firmly  established  the  existence  of 
genomic  rearrangements  occurring  in  epithelial 
malignancies,  but  could  also  shed  light  on  the  pro¬ 
gression  of  this  clinically  heterogeneous  disease.  In 
a  study  analyzing  paired  benign  and  malignant 
prostate  epithelial  cells  from  114  prostate  cancer 
patients  using  expression  microarray,  Petrovics 
et  al.  (2005)  identified  the  ETS-related  gene  ERG 
as  the  most  frequently  overexpressed  proto-onco¬ 
gene  in  the  transcriptome  of  malignant  prostate 
epithelial  cells.  In  many  cases,  the  overexpression 
of  the  ETS  family  of  genes,  including  ERG,  ETV1, 
and  ErTV4  is  apparently  caused  by  the  fusion  of 
TA1PRSS2,  a  gene  upregulated  by  androgens  in 
prostate  cancer  cells,  to  ETS  transcription  factor 
genes  (Tomlins  et  al.,  2005;  Cerveira  et  al.,  2006; 
Pemer  et  al.,  2006;  Wang  et  al.,  2006;  Yoshimoto 
et  al.,  2006;  Clark  et  al.,  2007).  Although  no  clinical 
characteristics  related  to  the  fusion  vvere  observed 
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by  Yoshimoto  et  al.  (2006),  both  Perner  et  al. 
(2006)  and  Wang  et  al.  (2006)  have  reported  a  posi¬ 
tive  association  between  T/VJPRSS2/ERG  fusion 
and  clinical  outcomes  of  prostate  cancer  progres¬ 
sion.  While  there  is  marked  heterogeneity  in  the 
expressed  forms  of  TMPRSS2/ERG  transcripts  in 
different  tumors  (Clark  et  al.,  2007),  Wang  et  al. 
(2006)  found  that  the  expression  of  various  fusion 
transcripts  was  correlated  with  clinical  outcome  of 
prostate  cancers.  It  is  possible  that  these  markedly 
different  isoforms  may,  at  least  in  part,  reflect  dif¬ 
ferences  in  genomic  architecture  resulting  from 
translocations  and  genomic  fusion  at  different  posi¬ 
tions. 
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While  the  causes  of  these  translocations  on  chro¬ 
mosome  21  in  prostate  tumors  remain  to  be  eluci¬ 
dated,  we  and  others  have  demonstrated  the  exis¬ 
tence  of  genomic  deletions  between  TMPRSS2 
and  ERG  using  100K  SNP  arrays  and  FISH  (Liu 
et  al.,  2006;  Perner  et  al.,  2006;  Yoshimoto  et  af,, 
2006).  Although  the  DNA  breakpoints  were  not 
identified  in  these  primary  tumors  due  to  the  lim¬ 
ited  resolution  of  the  techniques  used  in  these 
studies,  Hermans  et  al.  (2006)  have  mapped  the 
breakpoint  in  intron  2  of  ERG  and  introns  1  and  2 
of  TMPRSS2  in  xenograft  DNAs  using  long-range 
PCR.  In  addition,  the  effects  of  the  loss  of  one 
copy  of  the  genes  located  between  TMPRSS2  and 
ERG  remain  to  be  fully  evaluated.  However,  a 
knocking  out  HMGN1,  one  of  the  14  known  genes 
located  between  TMPRSS2  and  ERG,  in  a  mouse 
model  has  been  shown  to  result  in  increased  levels 
of  N-cadherin  expression  (Rubinstein  et  al.,  2005), 
which  is  a  characteristic  of  high-grade  prostate  can¬ 
cer  (Tomita  et  al.,  2000). 

To  further  characterize  the  genomic  alterations 
involved  in  the  fusion  of  these  two  genes,  we  now 
report  our  findings  from  41  pairs  of  primary  prostate 
tumors  and  matched  nonmalignant  tissues,  eval¬ 
uated  using  the  GeneChip  mapping  500K  SNP  array 
and  allele-specific  analysis.  Our  results  demonstrate 
that  multiple  genomic  alterations/translocations,  at 
least  in  part,  contribute  to  the  diversity  of  the  fusion 
transcripts  found  in  various  prostate  cancers. 

MATERIAL  AND  METHODS 
Study  Subjects 

All  subjects  for  the  somatic  DNA  deletion  analy¬ 
sis  of  prostate  tumors  were  prostate  cancer  patients 
undergoing  radical  prostatectomy  (RP)  for  treat¬ 
ment  of  clinically  localized  disease  at  Johns  Hop¬ 
kins  Hospital.  All  subjects  participated  in  an  IRB 
approved  protocol.  We  selected  41  subjects  from 
whom  genomic  DNA  of  sufficient  amount  (>5  pg) 
and  purity  (>70%  cancer  cells  for  cancer  specimens, 
no  detectable  cancer  cells  for  normal  samples) 
could  be  obtained  by  dissection  of  matched  nonma¬ 
lignant  (hereafter  referred  to  as  “normal”)  and  can¬ 
cer  containing  areas  of  prostate  tissue  as  determined 
by  histological  evaluation  of  H&E-stained  frozen 
sections  of  snap  frozen  RP  specimens.  Genomic 
DNA  was  isolated  from  trimmed  frozen  tissues  as 
previously  described  (Bova  etal.,  1993). 

GeneChip  Mapping  500K  SNP  Assay 

The  GeneChip  Mapping  500K  set  is  comprised 
of  Nsp  (~262,000  SNPs)  and  Sty  (~238,000  SNPs) 


arrays.  The  median  physical  distance  between 
SNPs  is  '--'2.5  kb  and  the  average  distance  between 
SNPs  is  ~5.8  kb.  The  500K  SNP  arrays  were  pur¬ 
chased  from  Affymetrix,  Santa  Clara,  CA.  All  of  the 
reagents  used  for  the  assay  were  obtained  from 
manufacturers  recommended  by  Affymetrix.  The 
500K  SNP  mapping  arrays  were  labeled,  hybri¬ 
dized,  washed,  and  stained  according  to  the  manu¬ 
facturer’s  instructions.  Briefly,  250  ng  of  genomic 
DNA  was  digested  with  either  Nsfil  or  Styl  and 
then  ligated  to  adapters.  A  generic  primer  that  rec¬ 
ognizes  the  adapter  sequence  was  used  to  amplify 
adapter  ligated  DNA  fragments  with  PCR  condi¬ 
tions  optimized  to  preferentially  amplify  fragments 
in  the  200  to  1,100  bp  size  range  in  a  GeneAmp 
PCR  System  9700  (Applied  Biosystems,  Foster 
City,  CA).  After  purification  with  a  Clontech  DNA 
amplification  clean  up  kit,  a  total  of  90  jig  of  PCR 
product  was  fragmented  and  a  sample  of  the  frag¬ 
mented  product  was  visualized  on  a  4%  TBE  aga¬ 
rose  gel  to  confirm  that  the  average  size  was 
smaller  than  180  bp.  The  fragmented  DNA  was 
then  labeled  with  biotin  and  hybridized  to  the 
arrays  for  18  hr.  The  arrays  were  washed  and 
stained  using  an  Affymetrix  Fluidics  Station  450 
and  scanned  using  a  GeneChip  Scanner  3000  7G 
(Affymetrix).  The  allele  intensity  of  each  SNP  was 
measured  using  the  GeneChip  Genotyping  analy¬ 
sis  software  (GTYPE).  DNA  copy  number  was  cal¬ 
culated  based  on  allele  intensity  using  two  different 
software  packages;  Copy  Number  Analyzer  for  Affy¬ 
metrix  GeneChip  (CNAG2.0;  Nannya  et  al.,  2005), 
and  dChip  analyzer  (dChip;  Lin  et  al.,  2004).  Allele- 
specific  analysis  was  also  performed  to  estimate 
DNA  copy  number  for  each  of  heterozygous  alleles 
using  CNAG2.0.  The  physical  positions  of  the 
detected  deletions  were  determined  based  on  the 
Human  hgl7  Assembly  (NCBl  Build  35). 

Quantitative  PCR  (qPCR) 

qPCR  was  performed  using  the  ABI  Prism  7500 
Sequence  Detection  System.  Primers  were  de¬ 
signed  using  Primer  Express  1.5  software  from 
Applied  Biosystems.  Amplicons  were  designed 
against  the  locus  that  was  putatively  homozygous 
deleted  in  the  tumor  and  a  control  locus  of  known 
normal  DNA  CN.  The  sequences  of  the  primers 
for  TMPRSS2  and  ERG  are  available  upon  request. 
PCR  kinetics  at  the  control  locus  were  used  to  con¬ 
trol  for  sample-to-sample  differences  in  genomic 
DNA  purity  and  concentration.  Three  concentra¬ 
tions  of  each  genomic  DNA  sample  (20,  10,  and  5 
ng)  were  assayed  in  duplicate,  using  each  pair  of 
real-time  PCR  primers.  PCRs  were  prepared  as  fol- 
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lows:  in  20  pi,  we  combined  2  pi  of  genomic  DNA, 
0.05  pM  of  each  primer,  and  SYBR-Green  PCR 
Master  Mix  from  Applied  Biosystems.  PCR  reac¬ 
tions  were  performed  as  follows:  95°C  for  10  min, 
followed  by  40  cycles  at  95°C  for  20  sec,  and  60°C 
for  1  min.  An  additional  cycle  of  95°C  for  15  sec, 
60°C  for  20  sec,  and  95°C  for  15  sec  was  run  at  the 
end  to  measure  the  dissociation  curve  for  quality 
control.  We  used  Sequence  Detection  Software 
(SDS)  for  PCR  baseline  subtraction  and  then 
exported  the  threshold  cycle  number  (Ct)  data  for 
analysis.  Ct  values  of  the  control  and  test  ampli- 
cons  for  the  three  dilutions  of  each  DNA  sample 
were  plotted  against  each  other,  and  the  offset 
between  the  two  samples,  along  the  control-ampli- 
con  axis  and  test-amplicon  axis  were  measured  to 
determine  the  status  of  genomic  deletions. 

Analysis  of  Fusion  Transcripts 

Total  RNA  was  isolated  from  1 0-um  sections  of 
prostate  tumors  using  TRIZOL  reagent  (Invi- 
trogen,  Carlsbad,  CA)  according  to  the  manufac¬ 
turer’s  instructions.  We  synthesized  cDNA  from 
the  RNA  using  a  cDNA  Synthesis  System  from 
Roche  Applied  Science  (Indianapolis,  IN)  accord¬ 
ing  to  the  manufacturer’s  protocol.  RT-PCR  was 
performed  in  a  50  pi  reaction  volume  for  1  cycle  at 
95°C  for  2  min,  then  38  cycles  at  95°C  for  30  sec, 
58°C  for  30  sec  and  72°C  for  1  min,  and  ending 
with  10  min  at  72°C.  We  used  ERG-V„F  (cacggt- 
taatgcatgctagaa)  and  ERG-V_R  (ggttgagcagctttcg 
actg)  for  expression  analysis  of  ERG  variant  1 
(NM_182918)  and  ERG  variant  2  (NM_004449). 
We  used  the  primers  t-e_A  F  (CAGGAGGCG 
GAGGCGGA)  and  t-e_A  R  (GGCGTTGTAGC 
TGGGGGTGAG),  t-e_C  FI  (CAGCAAGATGG 
CTTTGAACTC)  and  t~e_C  R1  (GGATCTGC 
TGGCACGATAAC),  respectively,  for  analyses  of 
the  fusion  transcripts  in  various  tumors. 

The  sequencing  reactions  were  performed  using 
dye-terminator  chemistry  (BigDye,  ABI,  Foster 
City,  CA).  The  reaction  was  performed  in  a  total 
volume  of  5  pi  with  about  10-30  ng  of  purified 
PCR  products,  0.5  pi  of  BigDye  and  0.16  pM  of  for¬ 
ward  or  reverse  PCR  primers  (t-e_A  F,  t-e_A  R, 
t-e_C  F:  GATAACAGCAAGATGGCTTTG  3', 
t-e_C  R:  G ATCTGCTG  G  CACG ATAACTC).  The 
sequencing  cycling  conditions  were  as  follows: 
95°C  for  30  sec;  followed  by  22  cycles  of  95°C  for 
30  sec,  50°C  for  10  sec  and  60°C  for  4  min.  After 
sequencing  cycling  the  samples  were  precipitated 
using  63  ±  5%  ethanol.  Then  10  pi  of  Hi-di  form- 
amide  (ABI,  Foster  City,  CA)  was  added  before 
samples  were  loaded  onto  an  ABI  3730  DNA  Ana¬ 


lyzer.  Sequencher™  software  version  4.1.4  (Gene 
Codes  Corporation)  was  used  for  sequence  analysis. 

RESULTS 

Genomic  Deletions  Between  TMPRSS2  and  ERG 

In  a  previous  study  using  the  100K  SNP  array, 
we  noticed  that  the  centromeric  boundaries  of 
deletions  between  TA1PRSS2  and  ERG  varied 
among  different  prostate  tumors,  while  it  was 
impossible  to  compare  the  boundaries  on  the  telo- 
meric  side  because  of  poor  resolution.  In  this  study, 
we  increased  the  number  of  study  subjects  by 
including  33  normal-tumor  pairs  of  new  samples, 
and  we  used  the  500K  mapping  array  to  refine  the 
boundaries  of  various  deletions  in  this  region.  In 
order  to  elucidate  the  detailed  genomic  structures 
involved  in  the  deletions,  we  also  included  eight  of 
the  tumors  that  had  been  previously  analyzed 
using  the  100K  SNP  array.  These  include  G6- 
002T,  G6-003T,  G7-002T,  G7-009T,  G7-017T,  G7- 
032T,  G8-001T,  and  G8-002T.  While  we  did  not 
present  the  deletions  between  TMPRSS2  and  ERG 
(T-E  deletion)  in  G6-002T,  G7-002T,  G7-032T, 
and  G8-002T  in  our  previous  publication,  we 
reported  T-E  deletions  in  the  rest  of  the  tumors 
(Liu  et  ah,  2006).  Using  the  500K  mapping  array  in 
this  study,  we  not  only  refined  T-E  deletions  in 
G6-002T,  G7-002T,  G7-032T,  and  G8-002T  but 
also  identified  8  more  subjects  harboring  T-E  dele¬ 
tions  among  the  33  pairs  of  new  samples  (Fig.  1). 

Homozygous  and  Hemizygous  Deletions 
Revealed  by  Allele-Specific  Analysis 

To  characterize  the  deletions  involving  TMPRSS2 , 
ERG,  and  the  genes  in  between,  we  employed  allele- 
specific  analysis  as  described  by  Nannya  et  al.  (2005). 
The  algorithm  of  allele-specific  analysis  takes  advant¬ 
age  of  genotype  information  and  allele-specific  inten¬ 
sities  from  paired  samples  to  estimate  DNA  copy 
numbers  for  each  heterozygous  SNP.  Allele-specific 
analysis  can  also  minimize  the  effect  of  normal  DNA 
contamination  from  nonmalignant  cells  in  the  tumors 
on  the  identification  of  DNA  copy  number  changes, 
because  the  other  allele  can  be  used  as  an  internal 
control  for  hemizygous  deletion  or  amplification. 

Using  allele-specific  analysis,  we  analyzed  a  total 
of  41  pairs  of  samples  from  primary  prostate  tumors 
and  matched  nonmalignant  tissues  as  described 
above  with  16  tumors  harboring  T-E  deletion 
(Supplementary  Table  1,  http://wwwl.wfubmc.edu/ 
Genomics/Publications+and  +  Data).  Of  the  geno¬ 
mic  losses  between  TMPRSS2  and  ERG,  13  tumors 
apparently  harbored  hemizygous  deletions  as  shown 
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Figure  2.  Scatter  plot  of  raw  log2  ratios  for  the  SNP  probes 
between  TMPRSS2  and  ERG.  (a)  log2  ratio  of  nonallele  based  analysis  of 
chromosome  21  with  10-SNP  genomic  smoothing  in  G9-009T  showing 
the  relative  position  of  the  deleted  region  to  be  analyzed,  (b-e)  Raw 
!og2  ratios  of  SNP  probes  between  TMPRSS2  and  ERG.  The  genomic 
DNA  between  the  SNPs  rs457920  and  rs2836656  was  apparently 
retained  in  G8-004T,  while  at  least  one  copy  of  this  region  was  deleted 
in  G8-00I  (c),  G9-009T  (d),  and  G7-003T  (e).  Blue  dots,  raw  data  of 
log2  ratios  from  Nsp  array.  Pink  dots,  raw  data  of  log2  ratios  from  Sty 
array.  The  y-axis  represents  the  log2  ratio  obtained  using  CNAG2.0. 
The  arrows  point  to  SNPs  that  map  to  the  regions  containing  the 
breakpoints  of  the  deletions.  [Color  figure  can  be  viewed  in  the  online 
issue,  which  is  available  atwww.interscience.wiley.com.] 


Figure  I .  DNA  copy  number  changes  on  chromosome  2 1  revealed 
by  allele-specific  analysis  of  the  genotype  and  allele  intensity  data  gener¬ 
ated  from  the  GeneChip  500K  SNP  assay  using  CNAG2.0  in  prostate 
tumors,  (a-d)  log2  ratio  analysis  with  10-SNP  genomic  smoothing 
shows  various  genomic  deletions  between  ERG  and  TMPRSS2,  as 
detected  by  two  independent  Nsp  and  Sty  arrays.  The  consistency  of 
the  results  from  these  two  independent  assays  confirmed  the  deletion 
in  this  region,  (e-r)  allele-specific  analysis  of  the  500K  (Nsp  and  Sty) 
SNP  data  illustrates  genomic  deletions  between  ERG  and  TMPRSS2  in 
different  prostate  tumors.  G6-002T  (k),  G6-003T  (e),  and  G7-0 1 7T  (i) 
were  analyzed  only  in  the  Sty  SNP  array.  Red  and  green  curves  repre¬ 
sent  different  alleles,  (s-u)  validating  deletions  of  ERG  (s),  MX I  (t),  and 
TMPRSS2  (u)  in  Tumor  G9-009T  by  real-time  quantitative  PCR.  Ct  val¬ 
ues  of  the  control  (GAPDH,  x-axis)  and  test  (y-axis)  amplicons  for  the 
three  dilutions  of  each  DNA  sample  were  plotted  against  each  other, 
and  the  offsets  between  best-fit  lines  for  the  samples  along  the  test- 
amplicon  axis  at  25  Ct  of  the  control-amplicon  axis  were  measured. 
The  offsets  in  the  Ct  values  (ACt)  between  tumor  DNAs  with  putative 
ERG,  MX  I,  or  TMPRSS2  deletions  and  paired  normal  DNAs  without  de¬ 
letion  in  these  regions  are  used  to  infer  DNA  copy  number  change.  Tu¬ 
mor  DNA  is  defined  as  having  a  hemizygous  deletion  when  the  ACt  is 
less  than  0.68  and  as  having  a  homozygous  deletion  when  the  ACt  is 
more  than  0.68,  assuming  25%  normal  DNA  contamination  is  common 
in  prostate  cancers.  Blue,  normal  DNA;  Pink,  tumor  DNA.  [Color 
figure  can  be  viewed  in  the  online  issue,  which  is  available  at  www. 
interscience.wiley.com.] 

in  Figure  1,  including  G6-002T,  G6-003T,  G7- 
002T,  G7-003T,  G7-013T,  G7-016T,  G7-017T,  G7- 
024T,  G7-029T,  G7-032T,  G8-002T,  G8-003T,  and 
G9-004T  The  results  from  independent  assays 
using  the  Nsp  array  and  the  Sty  array  were  very  con¬ 
sistent  (Figs,  la-ld),  and  this  consistency  was  used 
to  confirm  the  genomic  loss  of  one  copy  in  this 
region.  While  most  of  the  hemizygous  deletions 
among  these  13  tumors  appeared  to  be  similar,  as 
shown  in  Figures  la— lr,  two  of  them  (Figs.  If  and 


lq)  were  apparently  different  in  terms  of  the  size 
and  break  point  positions  of  the  deletions. 

Among  these  16  tumors  with  T-E  deletions, 
three  appeared  to  harbor  genomic  losses  that 
affected  both  of  the  alleles  of  TMPRSS2,  ERG ,  and 
other  genes  in  between  (Fig.  lm,  G9-009T;  Fig. 
In,  G8-001T;  and  Fig.  lq,  G8-004T).  Analyzing 
the  raw  data  of  log2  ratio  as  shown  in  Figure  2,  we 
found  that  the  homozygous  deletion  in  G8-004T 
also  affected  MX1  in  addition  to  TMPRSS  and 
ERG.  We  confirmed  the  homozygous  deletions 
using  qPCR  as  shown  in  Figures  ls-lu.  All  of  the 
other  genes  between  TMPRSS2  and  ERG ,  includ¬ 
ing  ETS2,  DSCR2,  BRWD1,  HMGM1 ,  SH3BGR, 
WRB ,  B3GALT5 ,  PCP4,  DSCAM ,  BACE2,  PLAC4 , 
FAM3B ,  and  MX2,  were  apparently  retained  in  the 
genome  of  G8-004T.  The  raw  data  of  log2  ratio 
also  revealed  different  break  points  involving  both 
TMPRSS2  and  ERG  in  different  tumors  (Fig.  2), 
which  indicates  that  different  breakpoint  positions 
may  contribute  to  the  variety  of  fusions  of 
TMPRSS2  and  ERG  observed  in  different  prostate 
tumors. 

DNA  Breakpoints  Involved  in  Different  Regions 
Within  both  TMPRSS2  and  ERG  Genes 

In  order  to  assess  the  contribution  of  the 
genomic  regions  harboring  these  breakpoints  to 
the  marked  differences  in  the  transcripts  of  the 
fusion  gene,  we  analyzed  the  raw  probe  intensity 


Genes,  Chromosomes  &  Cancer  DOT  1 0. 1 002/gcc 


976 


LiU  ETAL. 


TMPRSS2 


ERG 


Figure  3.  Mapping  the  breakpoints  of  genomic  deletions  in  different 
tumors  to  various  introns  of  TMPRSS2  and  ERG.  (a)  Relative  positions  of 
the  exons  and  introns  in  TMPRSS2  and  ERG  {UCSC  hg!7,  NM_005656.2, 
MN_004449.3).  (b-e)  The  raw  data  of  log2  ratios,  showing  the  regions 
involved  in  DNA  copy  number  changes  that  harbor  the  breakpoints  of 
genomic  deletions,  as  well  as  the  SNP  probes  that  map  to  these  regions: 
(b)  G8-004T,  (c)  G8-00IT,  (d)  G9-009T,  and  (e)  G7-003T.  Blue  diamonds, 


raw  data  of  log2  ratios  from  Nsp  array.  Pink  squares,  raw  data  of  log2 
ratios  from  Sty  array.  The  y-axis  represents  the  log2  ratio  obtained  using 
CNAG2.0,  and  the  x-axis  represents  the  physical  position  of  the  SNPs 
(UCSC  hg!7).  The  arrows  point  to  the  SNPs  that  map  to  the  regions 
containing  the  breakpoints  of  the  deletions.  The  vertical  dotted  lines 
depict  the  introns  where  SNPs  locate.  [Color  figure  can  be  viewed  in  the 
online  issue,  which  is  available  at  www.interscience.wiley.com.] 


data  of  log2  ratios  generated  from  GNAG2.0  for 
the  tumors  with  the  deletions  described  above, 
and  present  some  examples  of  the  results  in  Fig¬ 
ure  3.  We  also  used  a  3-SNP  genomic  smoothing 
approach  to  reduce  the  noise  of  the  raw  data  in 
order  co  identify  the  breakpoint  SNPs  as  presented 
in  Supplementary  Table  2  (http://wwwl.wfubmc. 
edu/Genomics/Publications+and  +  Data).  In  the 
tumor  G9-009T  (Fig.  3d),  a  change  of  log2  ratio  in 
TMPRSS2  apparently  occurred  between  the  first 
and  the  second  exons,  while  an  apparent  shift  of 
log2  ratio  occurred  between  exons  3  and  4  in  ERG 
(marked  by  vertical  dote  lines).  In  G7-003T  (Fig. 
3e)  the  obvious  decrease  of  log2  ratios  occurred 
between  exons  2  and  3  in  ERG,  while  an  apparent 
shift  of  log2  ratio  occurred  between  exons  1  and  4 
in  TMPRSS2.  In  G8-004Tand  G8-001T  (Figs.  3b 
and  3c,  respectively),  the  changes  of  log2  ratios  in 
ERG  clearly  occurred  between  exons  3  and  4, 


while  the  shift  of  log2  ratio  was  observed  between 
exons  1  and  4  in  TMPRSS2.  By  analyzing  both  the 
raw  and  smoothed  hybridization  intensity  of  probes 
at  each  SNP  between  TMPRSS2  and  ERG,  we 
found  that  the  deletions  started  and  ended  at  vari¬ 
ous  locations  in  different  tumors  (Figs,  1-3). 

Breakpoints  of  Genomic  Deletions  at  Multiple 
Locations  Correlate  with  Various  Forms 
of  Fusion  Transcripts 

The  high  resolution  of  the  500K  SNP  array 
enabled  us  to  map  the  regions  where  the  deletions 
started  and  ended  to  the  introns  within  TMPRSS2 
and  ERG  for  most  of  the  tumors.  Therefore,  it  was 
possible  for  us  to  predict  the  structure  of  the 
potential  transcripts  that  would  result  from  the  fu¬ 
sion  genes.  For  example,  in  tumor  G9-009T  the 
deletion  between  SNPs  rs2838043  and  rs2226683 
(Fig.  3d)  would  create  a  fusion  between  the  first 
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TABLE  I .  Expression  of  T/VIPRSS/ERG  Fusion  Transcripts 
in  Human  Prostate  Tumors 


Subject 

TI/ERG 

T  1-2/ERG 

ERG  vl 

ERGv2 

Vcap 

TI/E4-I 1 

+ 

4- 

Nl  (6S0N) 

+ 

1647  Ibbhl 

+ 

1000  aaa 

TI/E4-II 

+ 

+ 

G7-003T 

TI-2/E3-1 1 

4- 

+ 

G7-004T 

TI/E4-I  1 

TI-2/E4-1 1 

-1- 

G7-029T 

TI/E4-I  1 

+ 

+ 

G7-OI6T 

TI/E4-II 

+ 

G7-042T 

TI/E4-1 1 

TI-2/E4-1 1 

+ 

+ 

G8-003T 

T1/E4-1 1 

+ 

+ 

G8-004T 

TI/E4-1 1 

TI-2/E4-1 1 

+ 

G9-004T 

TI/E4-I  I 

+ 

+ 

G9-005T 

+ 

G9-007T 

TI/E4-1 1 

TI-2/E4-1 1 

+ 

+ 

G9-008T 

T1/E4-I 1 

+ 

G9-OIOT 

TI-2/E4-I  i 

Vcap,  cell  line;  Nl  (650N),  nonmaligrunt  tissues;  lOOOaaa,  positive  {+) 
control  for  TI/E4- 1 1 ;  1 647  Ibbh  I ,  positive  (+)  control  for  T I -2/E4- 1 1 ; 
ERG  vl,  ERG  variant  I  (NM_I829I8);  ERG  v2,  ERG  variant  2 
(NM_004449.3). 

incron  of  TMPRSS2  and  che  third  intron  of  ERG , 
which  would  result  in  the  most  frequent  fusion 
transcript  of  T1/E4-11  (Fig.  5)  as  described  previ¬ 
ously  (Tomlins  ec  al.,  2005;  Wang  et  ah,  2006;  Clark 
et  al.,  2007).  This  was  confirmed  by  analysis  of  the 
fusion  transcripts  in  che  tumors  G9-00ST,  G9- 
007T,  G9-004T,  G8-004T,  G8-003T,  G7-016T,  and 
G7-029T  (Table  1).  In  G7-003T  (Fig.  3e),  the 
deletion  between  the  SNPs  rs48 17953  and 
rsl0154090  would  be  expected  to  create  a  fusion 
between  the  second  intron  of  ERG  and  1st,  2nd,  or 
the  3rd  intron  of  TMPRSS2.  Because  the  region 
that  harbors  the  deletion  breakpoint  lies  between 
the  two  SNPs  (rs2838043  and  rsl0154090)  and  thus 
covers  three  introns,  with  che  2nd  and  the  3rd 
exons  of  TMPRSS2  in  between,  it  was  impossible 
for  us  to  map  the  breakpoint  to  a  narrower  region 
based  upon  che  currently  available  resolution  of 
the  500K  SNP  array.  Nevertheless,  the  fusion  of 
genomic  DNA  via  this  deletion  could  create  a 
fusion  gene  producing  transcripts  T1/E3-11,  Tl- 
2/E3-11  or  T1-3/E3-11,  depending  on  where  the 
breakpoint  occurs  in  TMPRSS2.  To  confirm  the 
predicted  transcript  which  would  result  based 
upon  che  genomic  deletion,  we  analyzed  che  fusion 
transcript  in  this  tumor  using  two  sets  of  RT-PCR 
primers.  The  first  set,  t-e_A,  is  located  in  the  1st 
exon  of  TMPRSS2  and  the  6th  exon  of  ERG.  The 
second  set,  t-e_C,  is  located  in  the  2nd  exon  of 
TMPRSS2  and  the  6th  exon  of  ERG.  The  size  and 
the  sequence  of  the  PCR  product  confirmed  that 
T1-2/E3-11,  a  new  form  of  fusion  transcript,  was 


S  t-c_A  1-eC 


Figure  4.  Analysis  of  TMPRSS2IERG  transcripts  in  different  prostate 
tumors,  (a)  RT-PCR  products  resolved  in  1 .5%  agarose  gel.  t-e_  A  pri¬ 
mers:  TMPRSS2  (cDNA  NM.005656)  12-28:  CAGGAGGCGGAG- 
GCGGA,  ERG  (cDNA  NM_004449)  762-742:  GGCGTTGT  A  GCT- 
GGGGGTGAG;  t-e_C  primers:  TMPRSS2  (cDNA  NM_005656)  121- 
141:  C AG C AAG AT G G CTTT G AACT C ,  ERG  (cDNA  NM_004449) 
599-580:  GGATCTGCTGGCACGATAAC.  Vcap,  celline  as  positive 
control,  (b)  Sequence  analysis  of  the  fusion  transcripts  in  G7-003  and  G8- 
004.  Blue  horizontal  bar,  TMPRSS2.  Red  horizontal  bar,  ERG.  [Color  figure 
can  be  viewed  in  the  online  issue,  which  is  available  at  www.interscience. 
wiley.com.] 

produced  from  the  fusion  gene  in  this  tumor 
(Fig.  4).  In  G8-004T  (Fig.  3b),  the  deletion 
between  SNPs  rsl6996350  and  rsl0154090  would 
be  expected  to  create  a  fusion  between  the  3rd 
intron  of  ERG  and  the  1st,  2nd,  or  3rd  intron  of 
TMPRSS2.  While  we  could  not  map  the  breakpoint 
to  a  narrower  region  for  the  same  reasons  described 
above,  the  fusion  of  genomic  DNA  via  this  deletion 
could  create  a  gene  producing  fusion  transcripts 
consisting  of  T1/E4-11,  T1-2/E4-11,  or  T1-3/E4- 
11.  Size  and  sequence  analyses  of  the  transcripts 
revealed  that  both  T1/E4— 11  and  T1-2/B4-11 
were  produced  in  this  tumor  (Fig.  4).  Therefore, 
the  deletions  with  breakpoints  mapped  to  different 
locations  in  various  introns  within  TMPRSS2  and 
ERG  as  presented  in  Figure  3  contributed  to  the 
production  of  different  forms  of  the  fusion  tran¬ 
scripts  observed  in  various  prostate  tumors. 

Expression  of  Fusion  Transcripts  Correlated  with 
Transcription  Variant  2  of  ERG 

There  are  two  transcript  variants  of  ERG  accord¬ 
ing  to  UCSC  RefSeq  Gene  database  (NM_182918, 
NM_004449).  We  analyzed  the  expression  of  these 
two  variants  and  various  TAIPRSS2/ERG  fusion 
transcripts  in  a  subset  of  the  prostate  tumors  for 
which  RNA  was  available  using  RT-PCR  (Table  1). 
The  RT-PCR  products  confirmed  the  fusion  of 
both  T1  and  Tl-2  to  different  exons  of  ERG  by 
sequencing  analysis  (Fig.  4).  It  is  interesting  that 
the  expression  of  ERG  variant  2  was  only  observed 
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in  tumors  expressing  the  fusion  transcripts, 
whereas  the  expression  of  variant  1  was  detected  in 
all  samples.  It  is  also  worth  noting  that  some  of  the 
fusion  transcripts  were  found  in  tumors  (e.g.,  G9- 
007T,  G9-008T,  G7-042T)  where  T-E  deletion  was 
not  detected  by  the  500K  SNP  array  (Table  1). 

DISCUSSION 

Two  major  findings  have  been  reported  since 
the  discovery  of  the  recurrent  fusion  between 
TMPRSS2  and  ERG  transcription  factor  genes  in 
prostate  cancer  by  Tomlins  et  al.  (2005).  First,  we 
and  others  have  demonstrated  that  genomic  dele¬ 
tions  contribute  to  the  fusion  between  TMPSS2 
and  ERG  (Liu  et  ah,  2006;  Perner  et  al.,  2006;  Yosh- 
imoto  et  al.,  2006),  Second,  a  number  of  distinct 
fusion  transcripts  have  been  reported  with  some  of 
them  associated  with  the  outcome  of  prostate  can¬ 
cer  (Clark  et  al.,  2006;  Soller  et  al,  2006;  Wang 
et  al,  2006;  Demichelis  et  al,  2007).  Although  alter¬ 
native  splicing  of  an  mRNA  encoded  by  a  single 
TMPRSS2IERG  fusion  gene  has  been  proposed  for 
this  phenomenon  (Wang  et  al,  2006;  Clark  et  al, 
2007),  the  source  of  the  diversity  observed  among 
these  fusion  transcripts  remains  elusive.  In  our 
new  work,  we  now  demonstrate  that  genomic  alter¬ 
ations  at  different  locations  within  TMPRSS2  and 
ERG  contribute  to  the  production  of  various  fusion 
transcripts  found  in  various  prostate  cancer  cells, 
based  on  our  data  showing  an  association  between 
a  specific  genomic  deletion  and  a  particular  form  of 
fusion  transcript  in  several  tumors.  This  conclusion 
is  supported  by  the  observation  of  multiple  break¬ 
points  in  introns  1  and  2  of  TMPRSS2  and  corre¬ 
sponding  transcripts  in  xenograft  DNAs  (Hermans 
et  al,  2006). 

Using  allele-specific  analysis  of  the  allele  inten¬ 
sity  and  genotype  data  generated  from  the  Gene- 
Chip  500K  SNP  array,  we  first  showed  that  both 
hemizygous  and  homozygous  deletions  occurred 
between  and  within  TMPRSS2  and  ERG  in  pros¬ 
tate  tumors.  In  tumor  G8-004T,  with  homozygous 
deletions  of  8  of  the  1 1  exons  in  ERG  and  at  least 
12  of  the  14  exons  in  TMPRSS2,  we  found  that 
all  of  the  genes  between  TMPRSS2  and  ERG 
remained  intact  except  MX1,  which  lost  11  of  its  17 
exons  (Fig.  2b).  This  result  suggests  that  deletions 
of  these  genes  may  not  be  as  biologically  important 
as  the  fusion  of  TMPRSS2  and  ERG  in  terms  of 
prostate  cancer  progression.  Although  it  is  not  clear 
whether  the  nondeleted  portion  still  remains  on 
chromosome  21,  the  presence  of  the  fusion  tran¬ 
scripts,  F1/E4-11  and  Tl— 2/E4— 11,  in  this  tumor 


indicates  a  portion  of  the  DNA  between  these  two 
genes  may  have  translocated  to  another  location  in 
the  genome.  Therefore,  in  addition  to  the  deletion, 
translocation  is  another  possible  mechanism  medi¬ 
ating  the  fusion  of  TMPRSS2  and  ERG,  which  is 
consistent  with  the  observations  from  previous 
studies  (Tomlins  et  al,  2005;  Hermans  et  al, 
2006). 

In  the  other  15  tumors  harboring  T-E  deletions, 
at  least  one  copy  of  these  14  known  genes 
described  above  was  lost  (Fig.  1).  Although  the  loss 
and  decreased  expression  of  the  genes  between 
TMPRSS2  and  ERG  have  been  proposed  to  be 
associated  with  cancer  progression  and  tumor 
growth  respectively  (Birger  et  al,  2005;  Perner 
et  al,  2006),  the  synergetic  effects  of  the  loss  of 
these  genes  and  the  fusion  of  TMPRSS2  and  ERG 
on  cancer  progression  need  to  be  further  investi¬ 
gated  in  a  larger  population  of  prostate  tumors 
before  definitive  conclusions  can  be  drawn. 

To  further  characterize  the  regions  involved  in 
the  deletions,  we  aligned  the  regions  where  the 
breakpoints  were  mapped  using  GLC  Free  Work¬ 
bench  bioinformatics  software  (http://clcbio.com). 
It  is  interesting  that  there  is  at  least  one  consensus 
sequence  identified  in  the  involved  regions  of  the 
introns  of  TMPRSS2  and  ERG  with  ~80%  of  the 
two  sequences  identical.  These  results  are  consist¬ 
ent  with  the  findings  of  Yoshimoto  et  al.  (2006) 
who  reported  that  at  least  one  area  in  the  intron 
following  the  transcribed  TMPRSS2  exon  had  up 
to  90%  homology  to  multiple  areas  of  the  intron 
preceding  the  relevant  ERG  exon  for  each  of  the 
fusion  transcripts  in  their  study.  We  further  found 
that  these  consensus  sequences  exist  not  only  in 
TMPRSS2  and  ERG  introns  that  are  involved  in 
partial  deletions  between  these  two  genes,  but  also 
in  other  rearrangements,  e.g.  in  the  ERG  intron 
and  a  nongenic  region  that  are  involved  in  a  dele¬ 
tion  of  8  ERG  exons  and  in  the  MXJ  and  TMPRSS2 
introns  that  are  involved  in  a  deletion  of  11  MX1 
exons  and  at  least  12  TMPRSS2  exons  without 
deleting  the  other  genes  located  in  between  (see 
Fig.  2b).  When  aligning  the  two  consensus  sequen¬ 
ces  identified  in  the  regions  containing  the  break¬ 
points  of  the  deletion,  we  found  that  they  are  very 
similar  to  the  human  Alu-Sq  and  Alu-Sp  subfam¬ 
ily  consensus  sequences,  with  more  than  80% 
homology. 

Genomic  deletions  mediated  by  recombination 
between  Alu  elements  in  humans  have  been 
reported.  Some  of  the  genomic  alterations  associ¬ 
ated  with  Alu  repeats  have  been  observed  to  be 
correlated  with  various  cancers  including  heredi- 
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Figure  5.  Schematic  presentation  of  the  relationship  between  the 
locations  of  the  human  Alu  family  consensus  sequence  in  TMPRSS  and 
ERG  and  the  forms  of  the  fusion  transcripts  summarized  by  Clark  et  al. 
(2007).  *Marks  the  7A'IPRSS2/£RG  fusion  transcripts  identified/predicted 
in  the  tumors  analyzed  in  this  study.  The  sizes  of  the  introns  and  exons 
of  these  two  genes  are  not  drawn  to  scale.  Dark  blue,  TMPRSS2  exons. 
Light  blue,  T/VIPRSS2  intron  containing  Alu  family  consensus  sequence. 
Red,  ERG  exons.  Pink,  ERG  introns  containing  Alu  family  consensus 
sequence.  [Color  figure  can  be  viewed  in  the  online  issue,  which  is  avail¬ 
able  at  www.interscience.wiley.com.) 

tary  nonpoiyposis  colorectal  and  endometrial  can¬ 
cers  (Mauilion  et  a!.,  1996),  acute  myeloid  leuke¬ 
mia  (Strout  et  al.,  1998),  and  breast  cancer  (Gad 
et  al.,  2002).  To  explore  the  association  between  Alu 
distribution  and  the  diversity  of  TMPRSS2jERG 
fusion  transcripts  in  prostate  cancers,  we  searched 
the  human  Alu  family  consensus  sequence  in 
TMPRSS2  and  ERG  using  the  reference  genes 
(UCSC  hgl7,  NM_005656.2,  MN_004449.3)  in  the 
NCBI  B12seq  utility.  As  presented  in  Figure  5,  the 
distribution  of  Alu  matches  well  with  all  17  struc¬ 
tures  of  TMPRSS2/ERG  fusion  transcripts  reported 
so  far  (summarized  by  Clark  et  al.,  2007)  and  the 
new  fusion  transcript  identified  in  this  study,  where 
the  retaining  of  exons  1-5  of  TMPRSS2  and  dele¬ 
tion  of  exons  1-5  of  ERG  were  observed  in  various 
tumors.  The  presence/absence  of  Alu  family  con¬ 
sensus  sequence  in  the  introns  of  TMPRSS2  and 
ERG  apparently  correlates  with  the  presence/ab¬ 
sence  of  fusion  transcripts  of  theses  two  genes. 
Therefore,  the  Alu  element  may  indirectly  contrib¬ 
ute  to  the  diversity  of  the  fusion  transcripts  found 
in  various  prostate  cancers  by  facilitating  recombi¬ 
nation  for  deletion,  translocation,  or/and  fusion  of 
TMPRSS2  and  ERG.  If  this  is  the  case,  additional 
new  forms  of  fusion  transcripts  will  be  identified 
when  more  prostate  tumors  are  analyzed  in  future 
studies.  On  the  other  hand,  the  association  bet¬ 
ween  presence/absence  of  fusion  transcripts  and 
presence/absence  of  Alu  elements  as  shown  in 
Figure  5  could  be  due  to  chance,  because  of  the 
abundance  of  Alu  sequences  in  the  human  genome 


(more  than  one  million  copies).  Therefore,  whether 
these  consensus  sequences  mediate  recombination 
for  deletion,  translocation  or/and  fusion  of  TMPRSS2 
and  ERG  in  prostate  cancer  needs  to  be  further 
investigated. 

We  also  found  a  significant  association  (P  — 
0.0179,  Fisher’s  exact  test)  between  the  expression 
of  the  TAJPRSS2/ERG  fusion  transcript  and  the 
expression  of  the  ERG  variant  2  transcript  (NM_ 
004449.3)  that  lacks  an  in-frame  exon  compared  to 
variant  1  (NM_182918)  in  prostate  tumors  (Table  1), 
although  the  molecular  mechanism  remains  un¬ 
known.  However,  this  observation  should  be  con¬ 
firmed  using  a  large  population  from  a  different 
cohort  in  the  future  studies. 

It  is  worth  noting  that  detectable  deletions/trans¬ 
locations  at  chromosome  band  21q22  cannot 
account  for  all  of  the  fusion  transcripts  found  in 
various  tumors.  For  example,  some  of  the  fusion 
transcripts  were  also  detected  in  the  tumors  (G7- 
042T,  G9-007T,  G9-008T)  where  apparent  T-E  de¬ 
letion  was  not  detected  by  the  500K  SNP  array 
(Table  1).  While  the  process  that  produced  these 
transcripts  remains  unknown,  they  might  result 
from  balanced  translations  which  can  be  further 
assessed  in  these  types  of  tumor  cells  using  differ¬ 
ent  techniques,  such  as  FISH  and  chromosomal 
painting. 

The  correlation  between  the  expression  of  vari¬ 
ous  fusion  transcripts  and  aggressiveness/clinical 
outcome  of  prostate  cancer  has  been  contradictory 
across  various  studies.  Although  the  expression  of 
some  TMPRSS2/ERG  transcripts  was  reported  to 
be  associated  with  clinical  outcome  of  prostate  can¬ 
cer  (Parner  et  al.,  2006;  Wang  et  al.,  2006),  we 
found  no  significant  correlation  between  T-E  dele¬ 
tion  and  Gleason  score  (from  Gleasons  6  to  9)  in 
our  study  population.  This  result  is  consistent  with 
the  observations  reported  by  Yoshimoto  et  al. 
(2006)  and  Lapointe  et  al.  (2007).  However, 
Winnes  et  al.  (2007)  observed  a  tendency  of  fusion¬ 
positive  tumors  being  associated  with  lower  Glea¬ 
son  grade  and  better  survival  than  fusion-negative 
tumors.  On  the  other  hand,  Demichelis  et  al. 
(2007),  Mehra  et  al.  (2007),  Nam  et  al.  (2007),  and 
Rajput  et  al.  (in  press)  reported  that  TMPRSS2/ 
ERG  fusion  was  associated  with  lethal,  high  patho¬ 
logic-stage,  higher  rate  of  recurrence  and  poorly 
differentiated  tumors,  respectively,  providing  thus 
far  the  strongest  correlation  between  the  fusion 
of  these  two  genes  and  the  outcome  of  prostate 
cancers. 

In  summary,  we  have  demonstrated  that  both 
hemizygous  and  homozygous  deletions  occurred 
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between  and  within  TMPRSS2  and  ERG  at  differ¬ 
ent  locations  on  chromosome  band  21q22  in  pros¬ 
tate  tumors.  The  presence  of  specific  genomic 
deletions  between  TMPRSS2  and  ERG  and  a  corre¬ 
sponding  form(s)  of  fusion  transcript(s)  in  various 
tumors  demonstrates  that  multiple  genomic  ake- 
rations/translocations  contribute  to  the  diversity  of 
the  fusion  transcripts  found  in  various  prostate  can¬ 
cers. 
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Abstract 

The  evidence  for  tumor  suppressor  genes  at  8p  is  well 
supported  by  many  somatic  deletion  studies  and  genetic 
linkage  studies.  However,  it  remains  a  challenge  to  pinpoint 
the  tumor  suppressor  genes  at  8p  primarily  because  the 
implicated  regions  are  broad.  In  this  study,  we  attempted  to 
narrow  down  the  implicated  regions  by  incorporating 
evidence  from  both  somatic  and  germline  studies.  Using 
high-resolution  Asymetrix  arrays,  we  identified  two  small 
common  deleted  regions  among  55  prostate  tumors  at  8p23.1 
(9.8- H. 5  Mb)  and  8p21.3  (20.6-23.7  Mb).  Interestingly,  our 
fine  mapping  linkage  analysis  at  8p  among  206  hereditary 
prostate  cancer  families  also  provided  evidence  for  linkage  at 
these  two  regions  at  Sp23.1  (5.S-11.2  Mb)  and  at  Sp21.3  (19.6- 
23.9  Mb).  More  importantly,  by  combining  the  results  from  the 
somatic  deletion  analysis  and  genetic  linkage  analysis,  we 
were  able  to  further  narrow  the  regions  to  ~  1.4  Mb  at  8p23.I 
and  ~3.1  Mb  at  8p21.3.  These  smaller  consensus  regions  may 
facilitate  a  more  effective  search  for  prostate  cancer  genes  at 
8p.  [Cancer  Res  2007;67(9):4098-103] 

Introduction 

Deletion  of  sequences  from  chromosome  8p  is  the  most 
common  deletion  event  in  the  genome  of  prostate  tumors  (1).  In 
a  recent  study  that  estimated  the  frequency  of  DNA  copy  number 
alterations  in  the  prostate  cancer  genome  based  upon  all  published 
comparative  genomic  hybridization  studies  of  prostate  cancers, 
we  found  that  one  third  of  891  prostate  cancers  had  a  deletion  at 
8p21.3,  considerably  higher  than  the  second  most  commonly 
deleted  region  at  6ql5  (22.4%;  ref.  2).  Despite  the  overwhelming 
evidence  for  8p  deletions,  few  specific  genes  have  been  consistently 
implicated  as  prostate  tumor  suppressor  genes  in  this  region.  One 
of  the  major  obstacles  in  the  identification  of  tumor  suppressor 
genes  at  8p  is  the  size  of  the  deleted  regions,  which  is  affected  by 
the  resolution  of  methods  used  to  detect  deletions.  For  example,  in 
our  study  cited  above  (2),  the  deleted  region  at  8p21.3  spans  a  27.1- 
Mb  interval  extending  into  8p23,3  and  8p21.1  and  contains  many 
genes.  Higher-resolution  detection  methods  that  can  detect  small 
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deletions  and  complex  deletion  patterns  are  needed  to  identify  8p 
tumor  suppressor  genes  (3). 

Furthermore,  results  from  genetic  linkage  studies  have  provided 
evidence  for  prostate  cancer  linkages  at  8p  (4,  5).  In  the  largest 
genome-wide  linkage  analysis  done  to  date,  Xu  et  al.  (5)  found 
suggestive  evidence  for  linkage  at  8p21,  one  of  the  five  most 
significant  in  the  genome,  among  1,233  prostate  cancer  families  of 
the  International  Consortium  for  Prostate  Cancer  Genetics 
(ICPCG).  Similar  to  the  results  of  somatic  deletion  studies,  few 
genes  in  this  8p  region  have  been  consistently  implicated  as  major 
prostate  cancer  susceptibility  genes  accounting  for  the  8p  linkage. 
One  of  the  major  difficulties  is  the  low  resolution  of  genetic  linkage 
studies,  which  are  typically  in  the  range  of  10  to  20  cM,  due  to 
limited  meiosis  events  in  families.  For  example,  the  1-LOD  drop 
interval  of  8p21  linkage  identified  in  the  ICPCG  study  was  13  cM 
(39-52  cM)  or  10  Mb  (22-32  Mb). 

Cancers  are  thought  to  arise  as  a  result  of  alterations  in 
expression  of  tumor  suppressor  genes  and  oncogenes  in  prostate 
epithelial  cells.  Altered  gene  expression  may  result  from  inherited 
genetic  changes  and  acquired  somatic  genetic  changes,  including 
deletions,  as  hypothesized  by  the  “two-hit“  model  (6).  Therefore, 
assuming  that  at  least  some  fraction  of  prostate  cancers  arise  from 
a  combination  of  inherited  and  acquired  genomic  events  affecting 
the  same  gene  or  combination  of  genes,  studies  that  simulta¬ 
neously  examine  inherited  genetic  changes  and  somatic  genetic 
alterations  of  chromosomal  regions  or  genes  may  improve  the 
likelihood  of  identifying  genes  involved  in  cancer  development. 

In  this  study,  we  have  taken  three  steps  to  identify  genomic 
regions  that  contain  prostate  cancer  genes.  First,  we  used  high- 
resolution  Affymetrix  single  nucleotide  polymorphism  (SNP) 
arrays  to  examine  detailed  deletion  patterns  at  8p  among  55 
prostate  cancers.  Second,  we  did  a  fine  mapping  linkage  analysis 
in  206  hereditary  prostate  cancer  (HPC)  families.  Finally  and 
more  importantly,  we  integrated  results  from  somatic  deletion 
analysis  and  germline  linkage  analysis  to  identify  a  consensus 
region. 

Materials  and  Methods 

Detection  of  8p  deletions  in  somatic  DNA  from  prostate  cancers.  All 
subjects  were  prostate  cancer  patients  undergoing  radical  prostatectomy 
for  treatment  of  clinically  localized  disease  at  Johns  Hopkins  Hospital.  For 
the  somatic  DNA  deletion  analysis  of  prostate  cancers,  we  selected  55 
subjects  from  whom  genomic  DNA  of  sufficient  quantity  (>5  pg)  and  purity 
(>70%  cancer  cells  for  cancer  specimens,  no  detectable  cancer  cells  for 
normal  samples)  could  be  obtained  by  macrodissection  of  matched 
norunalignant  (hereafter  referred  to  as  “normal")  and  cancer  containing 
areas  of  prostate  tissue  as  determined  by  histologic  evaluation  of  H&E- 
stained  frozen  sections  of  snap-frozen  radical  prostatectomy  specimens. 
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Two  Consensus  Regions  of  Prostate  Cancer  Genes  at  8p 


Genomic  DNA  was  isolated  from  trimmed  frozen  tissues  as  previously 
described  (7). 

We  used  Asymetrix  SNP  array  panels  to  detect  DNA  copy  number 
alterations,  and  for  this  study,  we  focused  entirely  on  Sp  deletions.  We  used 
the  TOOK  SNP  array  for  the  first  22  subjects  (3)  and  then  used  the  500K  SNP 
array  for  the  final  41  subjects  (eight  samples  were  analyzed  using  both  100K 
and  500K  arrays).  For  all  subjects,  we  analyzed  both  tumor  DNA  and  normal 
DNA  from  the  same  subject  using  Affymetrix  SNP  arrays  (either  IOOK  or 
500K  array).  The  normal  DNAs  were  extracted  from  histologically  normal 
prostate  tissue  from  the  same  prostate  or  from  the  seminal  vesicle  of  the 
same  patient  DNA  copy  number  was  calculated  based  on  allele  intensity 
using  two  different  software  packages:  Copy  Number  Analyzer  for 
Affymetrix  GeneChip  (CNAG2.0;  ref.  8)  and  dChip  analyzer  (9).  Allele- 
specific  analysis  was  also  done  to  estimate  DNA  copy  number  for  each 
chromosome  using  CNAG2.0.  The  physical  positions  of  detected  deletions 


were  based  on  the  Human  hgl7  Assembly  (NCBI  Build  35).  The  criteria  used 
in  dChip  analysis  for  this  study  are  similar  to  those  used  for  the  iOOK  SNP 
array  analyses  using  CNAT,  which  has  been  described  in  our  previous 
publication  (3).  Briefly,  to  reduce  random  noise  in  allele  intensity  at 
individual  SNPs,  we  first  estimated  DNA  copy  number  based  on  flanking 
SNPs  in  the  region,  using  the  10-SNP  smoothing  setting  of  dChip  software  to 
obtain  a  genome  smooth  average  copy  number  (GSACN)  for  each  SNP.  We 
then  defined  deletions  using  the  following  working  criteria:  a  minimum  of 
four  consecutive  SNPs  with  at  least  three  of  them  having  the  following 
characteristics:  the  GSACN  ratios  of  tumor/matched  normal  <0.75,  the 
GSACN  of  the  tumor  DNA  <1.9,  and  the  minimum  physical  leugth  of  the 
putative  deletion  S2  kb.  To  define  deletions  using  CNAG2.0  software  (for 
both  intensity-based  and  allele-specific  analyses),  we  also  used  a  10-SNP 
smoothing  setting  to  minimize  random  variations  at  individual  SNPs.  Each 
deletion  is  defined  by  whether  the  log  2  ratio  of  probe  intensity  is  below 


o  5  10  15  20  25  30  35  Mb 


9.8-1 1.5  Mb  20.6 -23.7  Mb 


DBC2/RIIOBTBI  (homozygous  deletion)  LOXL2 


Figure  1 .  Small  and  complex  partial  deletions  ol  8p.  A,  a  visual  summary  for  five  of  the  smallest  and  most  complex  tumors  that  were  detected  at  8p.  Top,  distances  in 
Mb  beginning  at  the  pter  ( Chr8 ).  Allele-specific  CNAG  output  for  each  of  these  five  tumors,  with  red  or  green  for  each  respective  chromosome.  Solid  vertical  lines 
highlight  a  region  at  8p21.3-8p21 .2,  spanning  from  20.6  to  23.7  Mb,  that  was  deleted  in  all  30  tumors.  Doited  vertical  lines  indicate  a  deleted  region  spanning  from  9.8  to 
1 1 .5  Mb  at  8p23.1  that  is  shared  by  29  of  the  30  tumors.  B,  results  of  deletion  confirmation  of  the  homozygous  deleted  regions  by  quantittaive  real-time  PCR  analyses. 
Left,  results  from  DBC2/RHOBTB1  primers  located  within  the  putative  homozygous  deletion.  Right,  results  from  LOXL2  primers  located  outside  the  homozygous 
deleted  interval.  C,  values  of  the  control  (X-axis)  and  test  (T-axis)  amplicons  for  the  three  dilutions  of  each  DNA  sample  were  plotted  against  each  other.  Offsets 
between  best-fit  lines  for  the  samples  along  the  test  ampticon  axis  at  25  C,  of  the  control  amplicon  axis  are  used  to  infer  DNA  copy  number.  Tumor  DNA  is  defined  as 
having  a  hemizygous  deletion  when  C ,  <  0.68  and  as  having  homozygous  deletion  when  C,  >  0.68,  assuming  25%  normal  DNA  contamination. 
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Table 

Tumor 

.  Estimated  regions  of  large  deletions  at  8p 

Gleason  Position  (start-end),  Mb 

Hemizygous  deletions  Homozygous 

deletions 

G7-022 

6 

19.5-27.4 

G7-013 

7 

9.8-11.5,  20.6-30.2,  31,1-40,8 

G7-G28 

7 

3.2-14.7,  15.2-15.9,  18.1-36.7  4.0-5.0,  5.6-5.S 

G9-010 

9 

0.2-0.9,  5.2-13.6, 

16.9-23.7,  24.2-24.4,  39.2 

to  centromere 

G7-017 

7 

0.2-26.8 

G9-005 

9 

6.3-39.0 

G9-003 

9 

4.1-36.1 

G7-021 

7 

4.9  to  centromere 

G7-026 

7 

0.2-31.5 

G6-002 

6 

0.2-42.4  22.4-23.0, 25.9-26.0 

G7-029 

7 

0.2-39.0 

G6-015 

6 

0.2-42.0 

G9-008 

9 

0.2-39.9 

G8-005 

8 

0.2  to  centromere 

G8-002 

8 

0.2-42.2 

G9-004 

9 

0.2-43.4 

G7-033 

7 

0.2-43.6 

G7-042 

7 

0.2-43.2 

G7-051 

7 

0.2  to  centromere* 

G7-016 

7 

0.2-42.3 

G7-023 

7 

0.2-43.8 

G7-019 

7 

0.2-43.8 

G7-004 

7 

0.2-43.0 

G9-001 

9 

0.2-43.6 

G9-009 

9 

0.2-43.8 

G9-011 ., 

9 

0.2-43.5 

G7-015 

7 

0.2  to  centromere* 

G7-024 

7 

0.2  to  centromere* 

G6-016 

6 

0.2  to  centromere* 

Note:  The  10-SNP  smoothing  log  2  ratio  output  from  CANG  2.0  for  the 

study  subjects  can  be  found  at  http://mvwl.wfubmc.edu/Genomics/ 

Publicatio  n  s+and + Da  ta/. 

^Deletion  extends  beyond  the  centromere. 

(no  overlap)  with  the  baseline  log  2  ratio  defined  by  the  matched  normal 
DNA.  The  baseline  log  2  ratio  has  a  theoretical  value  of  zero,  with  small 
variations  due  to  random  noise  (as  shown  in  Fig.  1).  The  results  from  all 
three  analyses  are  in  agreement  with  each  other,  except  that  the  boundaries 
of  the  deletions  defined  by  different  analyses  varied  from  one  to  five  SNPs. 

A  subset  of  putative  deletions  were  subject  to  confirmation  by 
quantitative  real-time  PCR  using  the  ABI  Prism  7000  Sequence  Detection 
System,  as  described  in  detail  elsewhere  (3). 

Linkage  analysis  in  prostate  cancer  families  and  construction  of  a 
recombinant  map.  All  206  HPC  families  were  collected  and  studied  at  the 
Brady  Urology  Institute  at  Johns  Hopkins  Hospital  (Baltimore,  MD)  as 
described  previously  (4).  Prostate  cancer  diagnosis  was  verified  by  medical 
records  for  each  affected  male  studied.  Age  at  diagnosis  of  prostate  cancer 
was  confirmed  either  through  medical  records  or  from  two  other 
independent  sources.  The  mean  age  at  diagnosis  was  64.3  years  for  the 
cases  in  these  families.  Eighty-four  percent  of  the  families  are  non-Jewish 
Caucasians,  6.996  are  Ashkenazi  Jewish,  and  8.896  are  African  Americans. 

Thirty  fine  mapping  microsatellite  markers  spanning  about  35  Mb  at  8p 
were  genotyped  in  these  206  HPC  families.  Following  multiplex  PCR  using 
fluorescently  labeled  primers,  the  resulting  PCR  fragments  were  separated 


using  capillary  electrophoresis  using  an  ABI  3700  sequencer.  Marker  allele 
frequencies  were  estimated  from  the  214  independent  individuals  in  the 
data  set  The  marker  order  and  distances  were  primarily  based  on 
information  available  from  the  MAP-O-MAT  web  site  (10).  Four  markers 
were  not  available  in  the  MAP-O-MAT  web  site;  their  order  and  distances 
were  interpolated  from  the  University  of  California  Santa  Cruz  (UCSC) 
Genome  Browser.3  Multipoint  linkage  analyses  were  done  using  both 
parametric  and  nonparamelric  methods  implemented  by  the  computer 
program  GENEHUNTER-PLUS  (11,  12).  For  the  parametric  analysis,  the 
same  autosomal-dominant  model  that  was  used  by  Smith  et  al.  (13)  was 
assumed.  For  the  nonparametric  analysis,  the  estimated  marker  identical  by 
descent  (IBD)  sharing  of  alleles  for  the  various  affected  relative  pairs  was 
compared  with  its  expected  values  under  the  null  hypothesis  of  no  linkage 
(NPL).  A  statistical  "Z-alT  in  the  program  was  used  (14).  Allele  sharing  LOD 
scores  were  then  calculated  based  on  the  statistical  “Z-all”  and  assigning 
equal  weight  to  all  families  using  the  computer  program  ASM  (12). 

Results 

Detection  of  somatic  DIVA  deletions.  Detectable  deletions  at 
8p  were  observed  in  29  of  the  55  prostate  cancers  (52.7396) 
examined  in  this  study  (Table  1).  Although  many  of  these  deletions 
involved  almost  the  entire  short  arm  of  chromosome  8,  we 
detected  partial  8p  deletions  in  10  of  these  tumors.  Among  these 
partial  deletions  of  8p,  five  were  smaller  or  more  complex  (Fig.  IA). 
This  includes  a  tumor  with  a  small  deletion  at  8p21  (G7-022),  three 
tumors  containing  multiple  interstitial  deletions  (G7-013,  G9-OIO, 
and  G7-028),  and  two  tumors  with  several  small  homozygous 
deletions  (G7-02S  and  G6-002).  No  copy  number  polymorphisms 
were  detected  in  these  samples,  and  these  homozygous  deletions 
are  due  to  somatic  DNA  loss. 

To  independently  confirm  the  ability  of  our  method  to  detect 
either  heterozygous  or  homozygous  deletion  events  at  8p21.3,  we 
did  quantitative  real-time  PCR  analyses  for  tumor  G6-002  using 
two  primer  pairs:  one  located  within  the  putative  homozygous 
deletion  ( DBC2/RH0BTB1 )  and  the  other  located  outside  the 
homozygous  deleted  interval  ( L0XL2 ).  The  results  of  this  analysis 
were  most  consistent  with  the  deletion  being  homozygous  at  the 
DBC2/RHOBTB1  locus  (ACt  =  0.90)  and  being  flanked  by 
hemizygous  deletions  at  the  L0XL2  locus  (ACt  =  0.38;  Fig.  IB). 

The  pattern  of  deletions  observed  among  the  partial  8p  deletions 
suggests  the  presence  of  two  smaller  deletion  regions.  In  particular, 
a  region  at  8p21.3-8p21,2,  spanning  from  20,6  to  23.7  Mb,  was 
deleted  in  all  29  tumors  (Fig.  1A,  solid  vertical  lines).  We  detected 
another  deleted  region  spanning  from  9.8  to  11.5Mb  at  8p23.1  that 
is  shared  by  28  of  the  29  tumors  (Fig.  1,  dotted  vertical  lines).  The 
primary  reason  these  regions  seem  to  be  separated  is  due  to  the 
three  tumors  with  interstitial  deletions. 

Prostate  cancer  linkage  region.  Linkage  analysis  of  206 
prostate  cancer  families  provided  evidence  for  a  susceptibility 
gene  at  8p  from  both  parametric  (using  a  dominant  model)  and 
nonparametric  linkage  analyses  (Fig.  2).  Interestingly,  two  separate 
linkage  peaks  were  observed.  One  peak  was  found  at  the  marker 
D8S258  of  8p21.3  (20,411,446),  with  a  LOD  score  of  2.51  (P  =  0.0007) 
and  an  NPL  score  of  3.14.  The  1-LOD  drop  interval  spanned 
~  4  Mb,  between  19.6  and  23.9  Mb.  The  other  peak  was  found  at 
the  marker  D8S503  of  8p23.1  (9,270,543),  with  a  LOD  score  of  1.50 
( P  =  0.009)  and  an  NPL  score  of  2.72.  The  1-LOD  drop  interval 
spanned  —  5.4  Mb,  between  5.8  and  11.2  Mb.  There  were  49  families 


3  http://genomc.ucsc.cdu/ 
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with  LOD  scores  >  0.588  (i3n„minai  =  0.05)  within  the  Spter-8pl2 
region,  with  all  but  two  of  these  families  being  linked  to  at  least  one 
of  the  two  regions  described  above.  Among  these  families,  IS 
families  had  positive  LOD  scores  across  these  two  regions,  16 
families  had  positive  LOD  scores  only  at  8p21.3,  and  13  families  had 
positive  LOD  scores  only  at  8p23.1. 

Combined  results  from  somatic  deletion  and  genetic  linkage 
analyses.  When  we  combined  the  results  from  our  somatic  deletion 
study  and  germline  linkage  study,  the  results  overlapped,  impli¬ 
cating  two  consensus  regions  at  8p  (Fig.  3).  One  was  at  8p21.3 
between  20.6  and  23.7  Mb,  and  the  other  was  at  8p23.I  between  9.8 
and  11.2  Mb.  Many  known  and  predicted  genes  are  located  within 
these  two  consensus  regions.  Five  known  genes  and  seven  pre¬ 
dicted  genes  are  located  within  the  8p23.1  consensus  region,  A  far 
greater  number  of  genes  are  located  at  the  8p21,3  consensus  region, 
with  at  least  37  known  protein-coding  genes.  Some  of  these  genes, 
including  NKX3.1,  have  been  previously  associated  with  HPC  (15). 

Interestingly,  the  8p21.3  homozygous  deletion  (between  22.4  and 
23.0  Mb)  identified  in  the  tumor  G6-002  falls  within  the  ~  3-Mb 
8p21.3  consensus  region.  Ten  known  genes  are  located  within  this 
homozygous  deletion  region. 

Discussion 

Chromosome  8p  has  received  a  great  deal  of  attention  from 
cancer  researchers  in  the  past  decade  because  it  is  commonly 
deleted  in  prostate  cancer  (1, 16)  as  well  as  in  many  other  cancers, 
including  colon,  breast,  ovarian,  liver,  lung,  bladder,  and  head  and 
neck  cancer.  Furthermore,  results  from  multiple  genetic  linkage 
studies  provide  evidence  that  8p  may  harbor  major  prostate  cancer 
susceptibility  genes.  Considerable  efforts  have  been  devoted  to  the 
identification  of  specific  prostate  cancer  genes  at  8p  that  account 
for  the  observations  from  deletion  and  linkage  studies.  Although 
several  candidate  genes  at  8p  have  been  reported  to  be  involved  in 
prostate  cancer  development,  including  NKX3.1,  N33,  MSR1,  and 
DLC1,  few  are  consistently  implicated  among  different  studies.  One 
of  the  major  difficulties  is  the  broad  genomic  regions  implicated  in 


these  deletion  and  linkage  studies;  for  example,  most  of  the 
observed  deletions  involve  the  entire  8p  arm.  In  this  study,  we  used 
two  complementary  methodologies  in  an  attempt  to  effectively 
narrow  the  genomic  region(s)  harboring  prostate  cancer  genes.  We 
used  high-resolution  Affymetrix  SNP  arrays  to  define  detailed 
deletion  patterns  at  8p  among  55  prostate  cancers.  This  analysis 
led  to  the  identification  of  two  small  deleted  regions  at  8p21.3  and 
8p23.1.  We  did  a  fine  mapping  linkage  analysis  at  8p  among  206 
HPC  families  and  obtained  evidence  for  linkage  at  these  two 
regions.  Most  importantly,  we  combined  the  results  from  the 
somatic  deletion  analysis  and  genetic  linkage  analysis  to  further 
narrow  the  regions  to  ~3.1  Mb  at  8p21.3  and  -  1.4  Mb  at  8p23.1. 
These  much  smaller  consensus  regions  will  likely  facilitate  more 
effective  searches  for  prostate  cancer  genes  at  8p. 

The  high-resolution  SNP  arrays  provide  a  better  tool  to  identify 
small  DNA  copy  number  alterations  and  to  examine  detailed 
patterns  of  deletions.  With  a  denser  resolution  of  SNPs  covering  the 
8p  region,  combined  with  allele  specific  analysis,  we  were  able  to 
detect  small  deletions  and  better  define  the  boundaries  of 
deletions.  In  addition,  the  high  density  of  SNPs  revealed  interstitial 
8p  deletions  in  three  cancers.  These  findings  allowed  us  to  identify 
two  small  overlapping  deleted  regions  at  8p21.3  and  8p23.1.  It  is 
interesting  to  note  that  these  two  separate  deleted  regions  are 
within  the  single  27.1-Mb  deleted  region  at  8p  identified  from  a 
combined  analysis  of  891  prostate  cancers  as  the  most  deleted 
region  in  the  genome  (2).  Our  current  study  provides  evidence  that 
the  previously  known  commonly  deleted  region  may  consist  of  two 
separate  deleted  regions. 

The  fine  mapping  panel  of  29  markers  at  8p  in  our  linkage  study, 
with  an  average  of  ~  1-cM  resolution,  provides  a  better  tool  to 
dissect  detailed  linkage  patterns  among  the  206  HPC  families.  In 
this  study,  we  were  able  to  confirm  prostate  cancer  linkage  at  8p 
among  a  large  number  of  prostate  cancer  families.  More 
importantly,  we  were  able  to  obtain  statistical  evidence  for  two 
separate  linkage  regions.  One  of  the  linkage  regions  (19.6-23.9  Mb 
at  8p21.3)  overlapped  with  the  1-LOD  drop  interval  at  8p21  (22-32 
Mb)  reported  from  1,233  ICPCG  prostate  cancer  families  (5). 


Figure  2.  Linkage  analysis  results  at  8p 
among  206  prostate  cancer  families. 

Two  primary  linkage  peaks  that  we 
observed.  Y-axis,  LOD  (light  purple  lines ) 
and  NPL  (dark  blue  lines)  scores;  X-axis, 
physical  position  along  the  8p  arm.  For  the 
peaks  observed  at  8p21.3  and  8p23.1, 
vertical  dashed  lines  have  been  used  to 
indicate  the  span  of  1-LOD  drop  intervals. 
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As  hypothesized  in  the  “two-hit”  model  (10, 11),  inherited  genetic 
defects,  combined  with  acquired  somatic  changes,  ultimately  alter 
the  expression  and/or  function  of  tumor  suppressor  genes  and  lead 
to  cancer.  Therefore,  approaches  that  combine  information  from 
germiine  and  somatic  studies  may  provide  better  power  to  identify 
cancer  genes.  This  combined  approach  has  been  successfully  used 
to  identify  the  APC  gene  for  familial  adenomatous  polyposis  (FAP). 
Results  from  genetic  linkage  studies  in  FAP  families,  somatic  loss  of 
heterogeneity  analysis,  and  an  interstitial  germiine  deletion  all 
converged  to  a  small  region  at  5q21  and  led  to  the  identification  of 
the  APC  gene  (17).  Although  there  are  large  differences  between  the 
rare  syndrome  of  FAP  and  prostate  cancer,  the  principle  of  the 
two-hit"  model  may  still  apply,  and  our  combined  approach 
represents  a  critical  step  toward  the  identification  of  prostate 
cancer  genes  at  8p. 

It  is  interesting  that  both  somatic  deletion  analysis  of  prostate 
tumors  and  germiine  linkage  analysis  of  prostate  cancer  families 
identified  the  same  genomic  regions.  Although  by  no  means 
conclusive,  this  overlap  is  consistent  with  the  hypothesis  that  the 
same  gene  or  genes  is  affected  both  at  the  germiine  and  somatic 
levels.  Unfortunately,  because  tumor  tissue  is  not  available  from  the 
families  linked  to  this  region,  we  can  not  determine  whether  the  non- 
linked  allele  is  more  likely  to  undergo  somatic  deletion,  as  is  ob¬ 


served  in  multiple  inherited  cancer  syndromes.  The  observation  that 
most  tumors  with  8p  deletions  have  deleted  both  of  the  implicated 
regions,  and  that  at  least  some  prostate  cancer  families  are  linked  to 
both  regions,  suggests  that  multiple  genes  in  these  intervals  may 
need  to  be  affected  before  prostate  carcinogenesis  can  proceed 
effectively.  In  any  event,  the  results  of  this  integrated  analysis 
improve  the  confidence  that  these  two  regions  most  likely  contain 
prostate  cancer  genes  as  well  as  provide  more  detailed  positional 
information  regarding  the  genomic  regions  harboring  these  genes. 

In  summary,  we  have  combined  genetic  linkage  information  with 
somatic  deletion  mapping  in  an  attempt  to  refine  the  localization 
of  prostate  cancer  genes  on  the  short  ami  of  chromosome  S.  The 
genomic  intervals  narrowed  by  this  combined  approach  provide 
novel  positional  information  useful  for  the  eventual  identification 
of  specific  genes  important  in  prostate  carcinogenesis. 
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BACKGROUND.  Identifying  genomic  regions  that  are  commonly  deleted  or  gained  in 
neoplastic  cells  is  an  important  approach  to  identify  tumor  suppressor  genes  and  oncogenes. 
Studies  in  the  last  two  decades  have  identified  a  number  of  common  DNA  copy  number 
alterations  in  prostate  cancer.  However,  because  of  various  sample  sizes,  diverse  tumor  types 
and  sources,  as  well  as  a  variety  of  detection  methods  with  various  sensitivities  and  resolutions, 
it  is  difficult  to  summarize  and  fully  interpret  the  overall  results. 

METHODS.  We  performed  a  combined  analysis  of  all  published  comparative  genomic 
hybridization  (CGH)  studies  of  prostate  cancer  and  estimated  the  frequency  of  alterations 
across  the  genome  for  all  tumors,  as  well  as  in  advanced  and  localized  tumors  separately.  A  total 
of  41  studies  examining  872  cancers  were  included  in  this  study. 

RESULTS.  Tire  frequency  of  deletions  and  gains  were  estimated  in  all  tumors,  as  well  as  in 
advanced  and  localized  tumors.  Eight  deleted  and  five  gained  regions  were  found  in  more  than 
10%  of  the  prostate  tumors.  An  additional  six  regions  were  commonly  deleted  and  seven  were 
commonly  gained  in  advanced  tumors.  While  8p  was  the  most  common  location  of  deletion, 
occurring  in  about  a  third  of  all  tumors  and  about  half  of  advanced  tumors,  8q  was  the  most 
commonly  gained  region,  affecting  about  a  quarter  of  all  tumors  and  about  half  of  all  advanced 
tumors. 

CONCLUSIONS.  The  large  number  of  tumors  examined  in  this  combined  analysis  provides 
better  estimates  of  the  frequency  of  specific  alterations  in  the  prostate  cancer  cell  genome,  and 
offers  important  dues  for  prioritizing  efforts  to  identify  tumor  suppressor  genes  and  oncogenes 
in  these  altered  regions.  Prostate  67:  692-700, 2007.  ©  2007  Wiley-Liss,  Inc. 

KEY  WORDS:  somatic;  gain;  deletion 


INTRODUCTION 

Prostate  cancer  is  the  most  common  cancer 
among  men  in  developed  countries.  It  is  estimated  that 
234,460  new  cases  of  prostate  cancer  will  be  diagnosed 
in  2006  in  the  United  States  [1].  Significant  progress  has 
been  made  in  understanding  the  etiology  of  this  disease 
in  the  past  few  decades.  Family  history  of  prostate 
cancer,  age,  and  race  are  three  well-established  risk 
factors  for  the  disease  [2],  Germline  and  somatic 
changes  in  many  genes  have  been  reported  to  be 
involved  in  the  development  of  prostate  cancer; 
however,  only  a  few  specific  genes  have  been  con¬ 
sistently  implicated. 


Cancers  are  thought  to  arise  as  a  result  of  alterations 
in  expression  of  tumor  suppressor  genes  and 
oncogenes  in  prostate  epithelial  cells.  Altered  gene 
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expression  may  result  from  one  or  a  combination  of 
factors,  including  inherited  genetic  changes,  acquired 
somatic  genetic  changes,  and  epigenetic  changes  such 
as  methylation  and  imprinting.  There  are  three  major 
types  of  somatic  genetic  changes  in  DNA  sequences, 
including  point  mutations  affecting  single  bases,  small- 
size  deletions  or  insertions  (INDELs),  and  large-size 
deletions  or  gains  (DNA  copy  number  alterations). 

Various  methods  have  been  used  to  detect  somatic 
DNA  copy  number  alterations  in  tumors,  including 
cytogenetic  evaluation  of  chromosomal  aberrations, 
DNA  polymorphism  analysis  for  detecting  loss  of 
heterozygosity  (LOH),  and  comparative  genomic 
hybridization  (CGH)  approaches  for  identifying 
segmental  copy  number  changes.  In  CGH  analysis, 
differentially  labeled  tumor  DNA  and  matched  normal 
DNA  are  co-hybridized  to  a  metaphase  chromosome 
spread  (conventional  CGH)  or  a  microarray  (array- 
based  CGH),  such  as  cDNA  array,  oligo-nucleotide 
array,  or  bacterial  artificial  chromosome  (BAC)  array. 
The  ratio  of  the  tumor  versus  normal  hybridization 
intensities  at  specific  intervals  or  probes  indicates  the 
relative  copy  number  at  the  genomic  location  mapped 
to  that  interval. 

Studies  of  DNA  copy  number  alterations  in  prostate 
cancers  have  identified  multiple  frequently  altered 
regions  in  the  genome  which  has  led  to  the  identifica¬ 
tion  of  important  prostate  tumor  suppressors  and 
oncogenes.  Some  examples  include  PTEN  at  the  10q23 
deleted  region  [3],  ATBF1  at  the  16q22  deleted  region 
[4],  KLF5  at  the  13q21  deleted  region  [5],  AR  at  the  Xql2 
gained  region  among  hormone  refractory  tumors 
[6],  and  MYC  at  the  8q21  gained  region  [7].  However, 
it  is  likely  that  other  tumor  suppressor  genes  and 
oncogenes  exist  in  these  frequently  altered  regions  that 
contribute  to  the  selection  advantage  that  occurs  in 
tumor  cells.  Furthermore,  no  specific  genes  have  been 
identified  in  other  commonly  deleted  and  gained 
regions,  probably  due  to  a  combination  of  relatively 
larger  sizes  of  affected  regions  and/or  relatively 
smaller  effects  of  the  genes. 

One  effective  approach  to  improve  the  ability  to 
identify  genes  that  drive  the  selection  of  altered  regions 
in  tumors  is  to  increase  the  study  sample  size  by 
combining  the  results  from  multiple  published  studies. 
Many  studies  on  DNA  copy  number  alterations  in 
prostate  cancers  have  been  published  over  the  past 
20  years,  using  various  molecular  methods.  However, 
the  differen  t  resolutions  of  these  methods  in  iden  tifying 
DNA  copy  number  alterations,  various  sample 
sizes,  and  diverse  tumor  types  and  sources  (loca¬ 
lized/primary  tumors,  metastatic/recurrent  tumors, 
xenografts  and  cell  lines)  makes  it  difficult  to  fully 
comprehend  and  interpret  the  overall  results.  There¬ 
fore  a  systematic  and  uniform  approach  was  needed.  In 


this  study,  we  performed  a  combined  analysis  of  all 
published  CGH  studies  of  prostate  tumors  with  the 
intent  of  estimating  the  frequencies  of  DNA  copy 
number  alterations  in  the  genome  and  narrowing  these 
regions  to  facilitate  the  identification  of  genes  driving 
the  selection  of  these  alterations. 

METHODS 

We  searched  the  PubMed  database  for  all  published 
papers  on  DNA  copy  number  alterations  of  prostate 
tumors  using  CGH  methods  as  of  April  of  2006.  A  total 
of  289  papers  were  retrieved  using  the  key  words 
"prostate”  and  "comparative  genomic  hybridization." 
The  relevance  of  these  papers  was  determined  by 
reviewing  the  abstracts,  methods,  and  results  sections 
of  each  paper.  Papers  were  excluded  if  the  DNA  copy 
number  alterations  of  individual  tumor  samples  were 
not  presented  or  if  they  could  not  be  inferred  from 
the  text,  tables,  or  graphs.  To  reduce  the  chance  of  the 
same  study  samples  being  counted  more  than  once,  the 
sources  of  material  for  each  study  were  carefully 
examined  and  then  any  duplicate  samples  were 
removed.  In  addition,  we  added  four  papers  from  the« 
references  cited  in  these  papers.  A  total  of  41  studies  - 
and  872  tumors  were  included  in  our  combined 
analysis  [8-48]. 

One  of  the  greatest  challenges  of  this  combined 
analysis  was  to  find  a  uniform  approach  to  address  the 
different  resolutions  of  cytogenetic  bands  used  to 
report  DNA  copy  number  alterations  due  to  various 
CGH  methods  in  these  published  studies.  To  reconcile 
these  differences,  we  chose  to  estimate  the  frequency  of 
DNA  copy  number  alterations  based  on  the  850-band 
cytogenetic  map.  The  frequency  of  deletion  or  gain  at 
each  of  the  850  bands  was  estimated  based  on  the 
number  of  tumors  having  deletions  or  gains  among  all 
the  tumors  examined  at  each  band,  respectively.  In 
studies  where  the  results  were  only  graphically 
reported,  we  inferred  the  altered  regions  from  the 
graphs.  The  peak  of  each  altered  region  was  defined  as 
the  cytogenetic  band  that  has  the  highest  percentage  of 
alterations.  The  interval  of  each  altered  region  was 
defined  as  regions  of  cytogenetic  bands  immediately 
surrounding  the  peak  with  a  percentage  that  was 
higher  than  the  lower  bound  of  the  95%  confidence 
interval  (Cl)  of  the  peak. 

The  frequency  of  DNA  copy  number  alterations 
was  first  estimated  in  all  prostate  cancers,  includ¬ 
ing  localized /primary  tumors,  metastatic/recurrent 
tumors,  and  cell  lines  or  xenografts.  These  frequencies 
were  further  estimated  separately  in  localized  tumors 
only,  including  localized /primary  tumors  (N  =  659) 
and  in  advanced  tumors  only,  including  metastatic/ 
recurrent  tumors  and  prostate  cancer  cell  lines  and 
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xenografts  (N  =  213).  The  difference  in  the  frequency  of 
alterations  between  the  two  groups  at  each  affected 
region  was  tested  using  two-sample  proportion  test. 

The  number  of  transcripts  residing  in  each 
implicated  region  was  obtained  from  the  Sanger 
Institute:  http:/ / www.ensembl.org/Multi/martview/ 
9mcHEx6SWc.mart.  We  searched  for  cancer- 
related  genes  in  several  databases:  (I)  Sanger 
Institute  (http://www.sanger.ac.uk/genetics/CGP/ 
Census/ chromosome.shtml),  (2)  Atlas  of  genetics  and 
Cytogenetics  in  Oncology  and  Heamatology  (http:// 
atlasgeneticsoncology.org),  and  (3)  ExPASy  Proteomics 
Server  (http://us.expasy.org).  We  also  included  addi¬ 
tional  candidate  cancer-related  genes  in  the  implicated 
regions  based  on  published  reports. 

RESULTS 

In  total,  872  independent  prostate  tumors,  including 
51  cell  lines  and  xenografts,  were  analyzed  for  DNA 
copy  number  alterations  using  CGH  methods  in  these 
41  papers.  Most  of  these  tumors  were  examined  for 
DNA  copy  number  alterations  in  the  entire  genome; 
however,  several  tumors  were  examined  for  alterations 
in  specific  chromosomes  or  chromosomal  arms.  There¬ 
fore,  the  number  of  tumors  examined  at  specific 
cytogenetic  bands  was  different  across  the  genome. 

The  frequency  of  deletions  and  gains  across  the 
genome  in  all  tumors,  as  well  as  in  advanced  and 
localized  tumors  were  estimated  (Fig.  la,b,  and 
supplement  Tables  I  and  II).  There  were  13  regions  in 
the  prostate  tumor  genome  that  were  frequently 
altered,  defined  as  those  observed  in  more  than  10% 
of  the  examined  tumors.  Among  these  common 
alterations,  eight  were  deletions  and  five  were  gains. 
In  addition,  six  additional  regions,  including  three 
deletions  and  three  gains,  were  found  in  more  than  10% 
of  advanced  tumors.  Tire  peak  and  interval  for  each  of 
these  altered  regions  are  presented  in  Table  I. 

Chromosome  8p  was  the  most  commonly  deleted 
region  in  the  genome.  The  peak  was  observed  at  8p21 .3 
(19.1-23.4  Mb);  284  of  833  (34.09%)  examined  tumors 
had  a  deletion  at  this  cytogenetic  band.  The  deletion 
pattern  was  unimodal;  the  frequency  of  deletion 
dropped  slightly  at  either  side  of  the  peak.  Seven 
cytogenetic  bands  (8p23.1  -8p21.1,  from  6.2  to  28.9  Mb) 
immediately  surrounding  the  peak  had  a  frequency  of 
deletion  that  was  higher  than  the  lower  bound  of  the 
95%  Cl  of  estimate  at  the  peak  (30.87%).  In  addition  to 
this  defined  interval,  the  deletion  curve  extended 
gradually  at  the  centromeric  side  for  five  additional 
cytogenetic  bands  (percentage  slightly  dropped  to 
~17.5%  at  8pll.l).  The  curve  dropped  sharply  to 
below  3%  beyond  the  centromere.  The  second  most 
commonly  deleted  region  was  observed  at  13q21.31 


(61.2—64.6  Mb);  228  of  813  (28.04%)  examined  tumors 
had  a  deletion  at  this  cytogenetic  band.  The  deletion 
pattern  was  also  unimodal  and  broad;  nine  cytogenetic 
bands  (13ql4.13-13p22.1,  from  44.7  to  74.2  Mb) 
immediately  surrounding  the  peak  had  an  estimated 
frequency  of  deletion  higher  than  the  lower  bound  of 
the  95%  Cl  of  the  estimate  at  the  peak  (24.96%).  Other 
commonly  deleted  regions  were  at  6ql4.1-6q21 
(22.24%),  16ql3-16q24.3  (17.85%),  18ql2.1-18q23 
(12.80%),  5ql3.3-5q21.3  (13.06%),  2q21.2-2q22.3 
(12.42%),  and  10q23.1-10q25.3  (11.76%). 

Chromosome  8q  was  the  most  commonly  gained 
region  in  the  genome  of  prostate  tumors.  The  pattern  of 
gain  at  this  region  was  bimodal.  The  first  peak  was 
observed  at  8q22.2  (99.1-101.6  Mb);  215  of  857  (25.09%) 
examined  tumors  had  a  gain  at  this  cytogenetic  band. 
The  second  peak  was  on  the  telomeric  side  at  8q24.13 
(122.5-124.9  Mb);  201  of  837  (24.01%)  examined  tumors 
had  a  gain  at  this  cytogenetic  band.  The  interval  of  the 
8q  gain  was  broad;  12  cytogenetic  bands  (8q21 .3— 
8q24.3,  from  87  to  143.1  Mb)  immediately  surrounding 
the  peak  had  frequency  of  gain  higher  than  the  lower 
bound  of  95%  Cl  of  estimate  at  the  peak  (22.18%). 
The  frequencies  of  the  remaining  commonly  gained 
regions  were  all  considerably  lower  (7%~12%). 
They  were  7qll.21-7q32.3  (12.48%),  Xqll.l-Xq23 
(10.86%),  17q24.1-17q25.3  (11.65%),  and  3q23- 
3q26.33  (10.24%). 

Similar  findings  were  observed  when  the  combin¬ 
ed  analyses  were  performed  separately  for  advanced 
and  localized  tumors.  All  of  the  commonly  deleted  and 
gained  regions  identified  in  all  tumor  samples  were 
frequently  altered  in  both  advanced  and  localized 
tumors.  The  frequencies  of  these  altered  regions  were 
however  two-  to  threefold  higher  in  advanced  tumors 
than  localized  tumors  (Fig.  la,b).  The  differences  in 
the  frequencies  between  these  two  groups  were 
statistically  significant;  the  Z-scores  ranged  from 
3.00  to  11.64  (P  values  0.003-10-13)  for  the  eight 
deletion  regions  and  from  2.96  to  7.06  ( P  values 
0.003  to  ~0)  for  the  five  gained  regions.  For  example, 
at  the  8p  deleted  region,  56.25%  of  the  advanced  tumors 
were  deleted  at  8p21.3,  compared  with  28.00%  of 
the  localized  tumors  that  were  deleted,  Z-score  =  7.40, 
P-value  - 1.7  x  10“13.  At  the  8q  gain  region,  47.87%  of 
the  advanced  tumors  had  gains  at  8q22.1,  compared 
with  17,10%  of  the  localized  tumors  having  gains  at  this 
cytogenetic  band,  Z-score  —  8.69,  P-value  ~0. 

In  addition  to  the  regions  that  were  implicated  in  all 
tumors,  there  were  several  novel  altered  regions 
implicated  in  advanced  tumors  only.  Six  additional 
regions  were  deleted  in  more  than  10%  of  advanced 
tumors,  including  15ql5.1-15q23  (21.93%),  4q1,3.1- 
4qter  (22.30%),  22qll.l-22qter  (15.48%),  lpter~lp21.1 
(14.89%),  9q21.13~9qter  (14.19%),  and  12p  (12.24%). 
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Fig.  I.  The  combined  frequencies  of  DNA  copy  number  alterations  in  the  genome  of  prostate  cancers.The  X-axis  represents  the  position  of 
the  genome  in  Mb,  and  theY-axis  represents  the  frequency  of  alterations  for  deletions  (a)  and  gains  (b).  Each  chromosome  was  designated  by  its 
corresponding  number  and  the  divisions  between  individual  chromosomes  are  shown  by  vertical  lines.  Frequencies  in  all  tumors  (N  =  872), 
localized  tumors  (N  =  659),  and  advanced  tumors  and  cell  lines  (N  =  225), including  metastatic/recurrent  tumors  (N  =  174)  as  well  as  prostate 
cancercelllinesandxenografts(N  =  51),  were  plotted  separately  as  represented  by  diamonds,  squares,  and  triangles,  respectively.  [Color  figure 
can  be  viewed  in  the  online  issue,  which  is  available  at  www.intersclence.wiley.com.] 
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TABLE  I.  Characterizations  of  the  Frequently  Altered  Chromosomal  Regions  in  the  Prostate  Cancers 
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"Regions  denoted  by  'a'  represent  the  altered  copy  number  frequencies  are  below  10%  in  all  tumors  but  above  10%  in  advanced  tumors. 
bThe  numbers  of  coding  proteins  were  obtained  from  http://www.ensembl.org/Multi/martview/frKYjCwJyF.mart 

^Cancer  related  genes,  with  exception  for  those  that  are  underlined  or  marked  with  asterisks,  were  defined  by  three  online  databases  described  in  the  text.  Genes  that  were 
underlined  are  not  located  in  but  near  the  defined  regions.  Genes  with  asterisks  are  not  included  in  the  three  databases,  but  their  relevance  to  prostate  cancer  was  proposed  in  the 
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Seven  additional  regions  had  gains  in  more  than  10% 
of  the  advanced  tumors,  including  llplS.  3-llpl3 
(13.55%),  Iq21.3-lq42.3  (16.49%),  2p22.3-2pl2 

(11.61%),  9q22.11-9q33.3  (12.23%),  3p21.33-3pl4.1 
(11.17%),  4q21.1-4q31.3  (10.64%),  and  6p21.33-6pll.l 
(12.92%). 

We  searched  three  public  databases  for  tumor 
related  genes  in  these  commonly  implicated  regions. 
We  also  included  additional  candidate  cancer-related 
genes  that  were  not  included  in  the  databases  but  were 
implicated  in  published  literature.  These  results  are 
presented  in  Table  I. 

DISCUSSION 

In  an  attempt  to  provide  a  systematic  summary  of 
previously  identified  DNA  copy  number  alterations  in 
prostate  cancer,  we  performed  a  combined  analysis  of 
all  CGH  studies  of  prostate  cancer  available  on 
PubMed.  The  large  number  of  cancers  examined  in 
this  combined  study  and  the  use  of  a  unified  cyto¬ 
genetic  map  should  improve  the  ability  to  accurately 
estimate  the  frequencies  of  DNA  copy  number  altera¬ 
tions  in  the  genome. 

In  this  analysis,  we  found  eight  commonly 
deleted  regions  and  five  commonly  gained  regions  in 
prostate  tumors.  We  also  found  six  additional  deleted 
regions  and  seven  gained  regions  that  are  commonly 
implicated  in  advanced  tumors.  Our  findings  on 
common  deletions  are  generally  consistent  with  the 
results  of  a  previous  review  paper  [49]  where  the  8p 
deletion  was  found  in  175  of  417  tumors  (42%).  To  our 
knowledge,  our  finding  on  the  commonly  gained 
regions  across  the  genome  of  prostate  tumors  is  the 
first  of  its  kind  in  the  published  literature.  In  addition, 
our  study  is  the  first  to  estimate  the  percentage  of 
deletions  and  gains  present  in  advanced  and  localized 
tumors. 

The  results  of  our  combined  analysis  indicate  that  8p 
is  the  most  commonly  deleted  region  in  the  prostate 
tumor  genome,  affecting  about  a  third  of  all  tumors  and 
half  of  advanced  tumors.  The  second  and  third  most 
commonly  deleted  regions  are  13q  and  6q,  slightly  less 
frequent  than  8p,  On  the  other  hand,  our  results  clearly 
show  that  8q  is  the  most  commonly  gained  region, 
affecting  about  a  quarter  of  all  tumors  and  half  of 
advanced  tumors.  The  frequency  of  8q  gain  is  con¬ 
siderably  higher  than  that  of  7q,  the  second  most 
commonly  gained  region  which  affects  about  10%  of  all 
tumors  and  nearly  a  quarter  of  advanced  tumors.  The 
estimated  percentage  of  alterations  at  each  commonly 
implicated  region  and  their  rank  in  the  genome  should 
provide  important  clues  in  prioritizing  our  efforts  to 
identifying  tumor  suppressor  genes  and  oncogenes  in 
these  altered  regions. 


Cancer  related  genes  in  the  intervals  of  the  com¬ 
monly  altered  regions,  as  shown  in  Table  I,  are 
reasonable  candidate  tumor  suppressor  genes  and 
oncogenes,  and  therefore  they  warrant  further  exami¬ 
nation.  For  example,  loss  of  the  PTEN  gene  copy  at 
10q24  commonly  reflects  a  loss  of  PTEN  tumor 
suppressive  function  and  drives  the  selection  of  cells 
that  have  alterations  at  this  region  [3].  Similarly,  the 
gain  of  AR  gene  copies  at  the  Xql2  commonly  gained 
region  may  lead  to  over  expression  of  AR  and  may  be 
responsible  for  prostate  tumor  progression,  parti¬ 
cularly  towards  androgen  independent  disease.  In 
addition  to  the  known  candidate  genes,  it  is  quite  likely 
that  multiple  novel  prostate  tumor  suppressors  and 
oncogenes,  located  in  regions  of  recurrent  copy  number 
alteration,  remain  to  be  identified. 

It  is  important  to  note  the  limitations  of  our  study.  As 
a  combined  analysis  of  published  papers,  our  study  is 
subject  to  publication  biases.  It  is  possible  that  studies 
are  more  likely  to  be  written  up  and  accepted  for 
publication  if  they  found  results  similar  to  those  of 
previously  published  studies,  and  this  may  inflate 
our  estimated  frequencies  of  the  most  commonly 
implicated  regions.  On  the  other  hand,  results  on  novel 
altered  regions  may  be  under-represented  in  publica-*”' 
tions.  This  may  explain  the  low  frequency  of  deletions 
reported  at  21q22,  which  was  recently  found  to  be 
commonly  deleted  in  prostate  cancers  [50-52].  In 
addition,  our  results  are  also  affected  by  the  limited 
resolution  of  the  methods  used  to  identify  DNA  copy 
number  alterations  in  the  original  published  papers. 
The  vast  majority  of  these  published  studies  used 
conventional  CGH  methods;  the  limited  resolution  of 
these  methods  (~10  Mb),may  affect  the  ability  to  detect 
small  size  alterations.  Furthermore,  the  limited  resolu¬ 
tion  may  also  contribute  to  the  relatively  poor  ability  to 
narrow  the  altered  regions  in  our  study.  A  small 
number  of  tumors  (<8%)  in  this  study  were  analyzed 
using  the  higher  resolution  approach  of  array-CGH 
methods  [8-10,12,14,15,18,53-55],  While  these  had 
better  ability  to  detect  small  size  deletions,  the  sample 
size  was  too  small  to  warrant  a  separate  analysis  and  to 
influence  the  overall  results. 

Future  studies  of  DNA  copy  number  alterations 
should  use  methods  with  higher  resolution.  Several 
high-resolution  genome-wide  analyses  have  recently 
been  developed  for  assessing  DNA  copy  number 
alterations,  including  representational  oligonucleotide 
microarray  analysis  (ROMA)  and  Affymetrix  SNP 
mapping  arrays  [50,56-60],  These  newer  molecular 
methods  will  improve  our  ability  to  accurately  identify 
DNA  copy  number  alterations  in  prostate  tumors, 
especially  smaller  alterations,  and  may  eventually  lead 
to  the  discovery  of  prostate  tumor  suppressor  genes 
and  oncogenes. 
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Although  multiple  recurrent  chromosomal  alterations  have  been  identified  in  prostate  cancer  cells,  the  specific  genes  driving 
the  apparent  selection  of  these  changes  remain  largely  unknown.  In  part,  this  uncertainty  is  due  to  the  limited  resolution  of 
the  techniques  used  to  detect  these  alterations.  In  this  study,  we  applied  a  high-resolution  genome-wide  method,  Affymetrix 
I00K  SNP  mapping  array,  to  screen  for  somatic  DNA  copy  number  (CN)  alterations  among  22  pairs  of  samples  from  primary 
prostate  cancers  and  matched  nonmalignant  tissues.  We  detected  355  recurrent  deletions  and  223  recurrent  gains,  many  of 
which  were  novel.  As  expected,  the  sizes  of  novel  alterations  tend  to  be  smaller.  Importantly,  among  tumors  with  increasing 
grade,  Gleason  sum  6,  7,  and  8,  we  found  a  significant  trend  of  larger  number  of  alterations  in  the  tumors  with  higher  grade. 
Overall,  gains  are  significantly  more  likely  to  occur  within  genes  (74%)  than  are  deletions  (49%).  However,  when  we  looked  at 
the  most  frequent  CN  alterations,  defined  as  those  in  >4  subjects,  we  observed  that  both  gains  (85%)  and  deletions  (57%) 
occur  preferentially  within  genes.  An  example  of  a  novel,  recurrent  alteration  observed  in  this  study  was  a  deletion  between 
the  ERG  and  TMPRSS2  genes  on  chromosome  21,  presumably  related  to  the  recently  identified  fusion  transcripts  from  these 
two  genes.  Results  from  this  study  provide  a  basis  for  a  systematic  and  comprehensive  cataloging  of  CN  alterations  associated 
with  grades  of  prostate  cancer,  and  the  subsequent  identification  of  specific  genes  that  associated  with  initiation  and  progres¬ 
sion  of  the  disease.  This  article  contains  supplementary  material  available  via  the  internet  at  http://www.interscience.wHey. 
com/jpages/ 1 045-2257/suppmat  ©  2006  Wiley-Liss,  Inc. 


INTRODUCTION 

Prostate  cancer  is  the  most  common  cancer 
among  men  in  the  USA.  Approximately  235,000 
American  men  may  be  diagnosed  with  prostate 
cancer,  corresponding  to  33%  of  all  cancer  cases, 
and  about  27,000  may  die  of  the  disease  in  2006 
according  to  the  American  Cancer  Society  (Jemal 
et  al„  2006).  The  lifetime  probability  of  develop¬ 
ing  prostate  cancer  for  men  is  one  in  six  in  USA, 
the  highest  in  comparison  to  other  cancers. 

Prostate  cancer  is  a  heterogeneous  collection  of 
subgroups  of  cancer  that  display  radically  different 
clinical  behavior.  While  some  of  prostate  cancers 
are  capable  of  dissemination  leading  to  death,  some 
are  relatively  indolent  (Cooperberg  et  ah,  2003). 
Some  are  hormone-sensitive.  Some  are  androgen- 
independent.  Some  are  associated  with  inherited 
alterations  while  others  are  somatically  acquired 
(Gonzalgo  and  Isaacs,  2003).  The  development  of 
these  different  types  of  prostate  cancers  may  be  re¬ 
gulated  by  different  mechanisms.  Correspondingly, 
different  therapeutic  targets  and  strategies  should 
be  chosen  for  effective  treatment  of  different  types 


of  prostate  cancer.  Therefore,  there  is  an  urgent 
need  to  understand  the  mechanisms  for  these  dis¬ 
tinct  subgroups  of  prostate  cancer  and  for  augment¬ 
ing  existing  classification  methods  such  as  Gleason 
grading  and  TNM  staging. 

Like  other  cancers,  prostate  cancer  is  character¬ 
ized  by  frequent  genomic  copy  number  (CN)  altera¬ 
tions  even  at  early  stages.  Using  cytogenetic,  loss  of 
heterozygosity  (LOH),  comparative  genomic  hybrid¬ 
ization  (CGH)  and  other  approaches,  deletions  and 
gains  in  the  genome  of  prostate  cancers  have  been 
identified  from  a  number  of  studies  (Kibel  et  ah, 
2000;  Dong,  2001,  2006;  Chu,  et  ah,  2003;  Clark 
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et  al.,  2003;  Dumur  et  a].,  2003;  Lieberfarb  et  al., 
2003;  Paris  et  al.,  2003,  2004,  2005;  van  Dekken 
et  al.,  2003,  2004;  Strohmeyer  et  al.,  2004;  Teixeira 
et  al.,  2004;  Watson  ec  al.,  2004;  Wolf  et  al.,  2004; 
Yano  et  al.,  2004;  Kasahara  et  al.,  2005;  van  Duin 
et  al.,  2005;  Postma  et  al.,  2006;  Saramaki  et  al., 
2006).  Although  several  candidate  tumor  suppres¬ 
sors  and  oncogenes  have  been  identified,  the  vast 
majority  of  cancer-associated  genes  involved  in 
these  genomic  CN  alterations  are  yet  to  be  identi¬ 
fied.  This  gap  is  partially  due  to  lack  of  high-resolu¬ 
tion  and  high-throughput  methods  to  effectively 
pinpoint  genomic  CN  alterations  at  the  gene  level 
in  a  large  number  of  prostate  cancers. 

Genome-wide  analyses  with  high-resolution  have 
recently  been  developed  for  assessing  genomic  CN 
alterations.  Representational  oligonucleotide  micro¬ 
array  analysis  (ROMA),  with  an  average  resolution  of 
about  35  kb,  has  been  used  to  identify  both  genomic 
abnormalities  in  tumor  cells  and  CN  polymorphisms 
(CNPs)  in  the  normal  genome  (Lucito  et  al.,  2003; 
Sebat  et  al.,  2004).  Affymetrix  100K  single  nucleo¬ 
tide  polymorphism  (SNP)  mapping  array,  with  an  av¬ 
erage  resolution  of  about  24  kb,  has  recently  been 
demonstrated  to  be  very  effective  in  association 
studies  of  SNPs  and  in  the  identification  of  genomic 
CN  alterations  including  CNPs  (Garraway  et  al., 
2005;  Klein  et  al.,  2005;  Slater  et  al.,  2005;  Zhao 
et  al.,  2005;  Liu  et  al.,  in  press).  In  the  present  study, 
we  report  deletions  and  gains  in  prostate  cancer 
genomes  detected  in  DNA  samples  isolated  from  22 
pairs  of  tumor  and  matched  nonmalignant  tissues 
using  Affymetrix  100K  SNP  mapping  arrays. 

MATERIALS  AND  METHODS 
Study  Subjects 

All  subjects  in  this  study  were  prostate  cancer 
patients  undergoing  radical  prostatectomy  (RP)  for 
treatment  of  clinically  localized  disease  at  the 
Johns  Hopkins  Hospital.  We  selected  22  subjects 
from  whom  genomic  DNA  of  sufficient  amount 
(>5  pg)  and  purity  (>70%  cancer  cells  for  cancer 
specimens,  no  detectable  cancer  cells  for  normal 
samples)  could  be  obtained  by  macrodissection  of 
matched  nonmalignant  (hereafter  referred  to  as 
“normal”)  and  cancer  containing  areas  of  prostate 
tissue  as  determined  by  histological  evaluation  of 
H&E  stained  frozen  sections  of  snap  frozen  RP 
specimens.  Genomic  DNA  was  isolated  from  trimmed 
frozen  tissues  as  previously  described  (Bova  et  al., 
1993).  DNA  samples  from  prostate  cancers  meeting 
the  same  purity  criterion  and  prepared  in  an  identi¬ 
cal  fashion  from  an  additional  69  patients  were 


included  in  this  study  to  estimate  the  frequency  of 
identified  deletions  and  gains  using  qPGR. 

Affymetrix  1 00K  SNP  Mapping  Array 

The  Affymetrix  100K  SNP  mapping  array  includes 
116,204  SNPs  in  two  chips,  Xba240  and  Hind240. 
The  chips  and  reagents  were  obtained  from  Affy¬ 
metrix  and  the  assays  were  carried  out  according  to 
the  manufacturer’s  instructions.  Briefly,  250  ng  of 
genomic  DNA  were  digested  with  either  Hindlll 
or  Xbdi  and  then  ligated  to  adapters  that  recognize 
the  cohesive  four  base-pair  (bp)  overhangs.  A  ge¬ 
neric  primer  that  recognizes  the  adapter  sequence 
was  used  to  amplify  adapter-ligated  DNA  frag¬ 
ments  with  PGR  conditions  optimized  to  preferen¬ 
tially  amplify  fragments  in  the  250-2,000  bp  size 
range  in  a  GeneAmp  PGR  System  9700  (Applied 
Biosystems,  Foster  City,  CA).  After  purification 
with  a  Qiagen  MinElute  96  UF  PCR  purification 
system,  a  total  of  40  pg  of  PCR  product  was  frag¬ 
mented  and  a  sample  of  about  2.9  pg  was  visual¬ 
ized  on  a  4%  TBE  agarose  gel  to  confirm  that  the 
average  size  was  smaller  than  180  bp.  The  frag¬ 
mented  DNA  was  then  labeled  with  biotin  and 
hybridized  to  the  GeneChip  Mapping  100K  Set  for 
17  hr.  We  washed  and  stained  the  arrays  using  the 
Affymetrix  fluidics  Station  450  and  scanned  the 
arrays  using  a  GeneChip  Scanner  3000  G7  (Affy¬ 
metrix,  Santa  Clara,  CA).  The  Affymetrix  Gene¬ 
Chip®  Operating  Software  (GCOS)  collected  and 
extracted  feature  data  from  Affymetrix  GeneChip® 
Scanners.  The  GeneChip  Genotyping  analysis  soft¬ 
ware  (GTYPE)  was  used  to  analyze  feature  intensity 
data  stored  in  the  GCOS  Database,  and  provided 
high-throughput  and  accurate  genotyping  analysis. 

DNACN  and  Classification  of  Deletions  and  Gains 

DNA  CNs  were  calculated  based  on  allele  inten¬ 
sity  (the  sum  of  both  allele  intensity)  of  each  SNP 
probe  on  the  100K  SNP  mapping  array  using  three 
different  software  packages;  Chromosome  CN 
Analysis  Tool  (CNAT,  Huang  et  al.,  2004;  Slater 
et  al.,  2005)  version  3.0,  CN  Analyzer  for  Affyme¬ 
trix  GeneChip  (CNAG,  Nannya  et  al.,  2005),  and 
dChip  analyzer  (dChip,  Lin  et  al„  2004).  Deletions 
and  gains  were  defined  based  on  DNA  CNs  of 
100K  SNPs  using  a  set  of  working  criteria,  which  is 
implemented  in  an  in-house  script. 

Quantitative  Real  Time  PCR  (qPCR) 

A  subset  of  putative  deletions  and  gains  were 
subjected  to  confirmation  by  quantitative  real-time 
PCR  (qPCR)  using  the  ABI  Prism  7000  Sequence 
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TABLE  I.  Characteristics  of  Study  Subjects 


Subject  ID 

Gleason  sum 

Gland  weight  (g) 

Age  at  surgery 

Race 

6-675 

6 

49 

57 

European  American 

6-795 

6 

47 

51 

African  American 

6-800 

6 

37 

61 

European  American 

6-816 

6 

42 

62 

European  American 

6-978 

6 

46 

62 

European  American 

6-1019 

6 

50 

53 

European  American 

6-1048 

6 

53 

44 

European  American 

6-1203 

6 

49 

60 

European  American 

7-372 

7 

93 

69 

European  American 

7-700 

7 

49 

66 

European  American 

7-721 

7 

42 

58 

European  American 

7-782 

7 

42 

58 

European  American 

7-814 

7 

57 

67 

European  American 

7-938 

7 

74 

53 

European  American 

7-989 

7 

64 

58 

European  American 

7-994 

7 

58 

S3 

African  American 

8-535 

8 

55 

55 

European  American 

8-541 

8 

50 

67 

European  American 

8-780 

8 

60 

53 

European  American 

8-1070 

8 

55 

66 

European  American 

9-401 

9 

40 

NA 

European  American 

9-731 

9 

45 

52 

European  American 

Detection  System.  Primers  were  designed  using 
Primer  Express  1.5  software  from  Applied  Biosys¬ 
tems.  Amplicons  were  designed  against  the  puta¬ 
tively  altered  locus  and  a  control  locus  (GAPDH) 
with  DNA  CN  of  2.  The  PCR  kinetics  at  the  con¬ 
trol  locus  was  used  for  controlling  sampie-to-sam- 
ple  differences  in  genomic  DNA  purity  and  con¬ 
centration.  Three  concentrations  of  each  genomic 
DNA  sample  (20,  10,  and  5  ng)  were  assayed  in 
duplicate,  using  each  pair  of  real-time  PCR  pri¬ 
mers.  PCRs  were  prepared  as  follows:  in  20  pi,  we 
combined  2  pi  of  genomic  DNA,  0.05  pM  of  each 
primer,  and  SYBR-Green  PCR  Master  Mix  from 
Applied  Biosystems.  PCRs  were  performed  as  fol¬ 
lows:  95°C  for  10  min,  followed  by  40  cycles  at 
95°C  for  20  sec,  and  60°C  for  1  min.  An  additional 
cycle  of  95°C  for  15  sec,  60°C  for  20  sec,  and  95°C 
for  15  sec  was  run  at  the  end  to  measure  the  disso¬ 
ciation  curve  for  quality  control.  We  used  the 
Sequence  Detection  Software  (SDS)  for  PCR  base¬ 
line  subtraction  and  exported  the  threshold  cycle 
number  (Ct)  data  for  analysis.  Ct  values  of  the  con¬ 
trol  (x-axis)  and  test  (y-axis)  amplicons  for  the  three 
dilutions  of  each  DNA  sample  were  plotted.  The 
differences  in  the  Ct  values  (ACt)  between  tumor 
and  matched  normal  DNA  are  used  to  infer  DNA 
CN,  ACt  is  approximately  equal  to  -log  2(f) 
assuming  PCR  efficiency  is  100%,  where /is  the  ra¬ 
tio  of  DNA  amount  between  tumor  and  normal 
samples.  ACt  is  equal  to  one  if /is  0.5  for  a  hemizy- 
gous  deletion  and  ACt  is  equal  to  —0.58  if  /is  1.5 


for  a  three  copies  of  DNA.  Because  of  contamina¬ 
tion  of  normal  DNA  in  macrodissected  tumor 
DNA  in  our  study,  the  exact  ACt  values  for  dele¬ 
tions  and  gains  are  uncertain.  Assuming  25-40% 
normal  DNA  contamination,  the  ACt  for  a  hemizy- 
gous  deletion  is  between  0.68  and  0.51,  and  for 
three  copies  of  DNA  is  between  —0.46  and  —0.38. 

Statistical  Analysis 

Kruskal- Wallis  test  was  performed  to  assess  differ¬ 
ence  in  the  numbers  of  DNA  CN  alterations  among 
the  four  Gleason  groups.  The  difference  in  the  pro¬ 
portions  of  SNPs  that  are  located  within  genes  for 
SNPs  involved  in  the  altered  regions  and  for  all  of 
the  SNPs  in  the  100K  SNP  array  was  tested  by  the 
two  sample  proportion  test  (two-sided). 

RESULTS 

Detection  of  DNA  CN  Alterations 
in  Macrodissected  Prostate  Tumors 

Characteristics  of  the  22  subjects,  including  race, 
age  at  surgery,  Gleason  sum,  and  pathologic  stage 
are  presented  in  Table  1.  DNA  samples  prepared 
in  a  similar  fashion  from  prostate  cancers  of  an 
additional  69  patients  were  used  in  qPCR  analysis 
to  estimate  frequencies  of  alterations  found  in 
SNP  based  studies. 

We  obtained  excellent  SNP  call  rates  using  the 
100K  SNP  mapping  array  for  DNA  samples  iso¬ 
lated  from  tumor  tissues  (average  of  97.34%)  and 
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Figure  I.  Examples  of  genomic  gains  on  chromosome  3  and  dele¬ 
tions  on  chromosome  4  identified  in  tumor  genomes  using  Affyme- 
trixIOOK  SNP  mapping  array.  While  DNA  CNs  from  matched  normal 
tissues  were  ^2  on  chromosome  3  (Fig  I  A)  and  on  chromosome  4 
(Fig  I D),  DNA  CNs  were  increased  (CN  =  3-4)  in  multiple  regions  on 
chromosome  3  (Fig.  IB)  and  decreased  (CN  =  I)  in  multiple  regions  on 


chromosome  4  (Fig.  I E).  Results  from  qPCR  analyses  were  consistent 
with  that  of  SNP  mapping  array.  Compared  with  die  DNA  CN  of  nor¬ 
mal  DNA,  DNA  CN  in  tumor  DNA  was  higher  on  chromosome  3 
(Fig.  I C)  and  lower  on  chromosome  4  (Fig.  I F).  [Color  figure  can  be 
viewed  in  the  online  issue,  which  is  available  at  www.interscience.wiley. 
com.] 


from  matched  normal  tissues  (average  of  97.45%). 
The  high  SNP  call  rates  suggest  high  quality  allele 
intensity  data.  DNA  CN  at  each  SNP  was  esti¬ 
mated  from  the  allele  intensity  data  using  three 
different  software  packages,  CNAT,  CNAG,  and 
dChip.  Similar  results  for  DNA  CNs  were  obtained 
from  these  analyses;  however,  we  primarily  described 
the  results  obtained  from  CNAT. 

DNA  CN  alterations  can  be  detected  in  DNA 
from  macrodissected  cancer  samples  using  the  100K 
SNP  mapping  array.  Figure  1  presents  typical  results 
of  DNA  CNs  estimated  from  tumor  DNA  and  from 


the  matched  normal  DNA.  While  DNA  CNs  from 
matched  normal  tissues  were  ~Z  for  this  subject  on 
chromosome  3  (Fig.  1A)  and  on  chromosomes  4 
(Fig.  ID),  cancer  DNA  CNs  were  increased  (CN  = 
3-4)  in  multiple  regions  on  chromosome 3  (Fig.  IB) 
and  decreased  (CN  =  1 )  in  multiple  regions  on  chro¬ 
mosome  4  (Fig.  IE).  We  chose  one  region  each  from 
chromosome  3  and  4  and  performed  qPCR.  Results 
from  the  qPCR  analyses  were  consistent  with  that 
of  SNP  mapping  array.  Compared  with  the  DNA 
CN  of  normal  DNA,  DNA  CN  in  cancer  DNA  was 
higher  on  chromosome  3  (Fig.  1C)  and  lower  on 
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chromosome  4  (Fig.  IF).  Note  chat  this  particular 
deleted  region  was  relatively  small  (<1  Mb),  but 
can  be  detected  by  both  SNP  mapping  array  and 
qPCR.  These  results  demonstrated  that  Affymetrix 
100K  SNP  mapping  array  can  be  used  to  detect  CN 
alterations  in  DNA  prepared  from  macrodissected 
prostate  tumors. 

Working  Criteria  for  Defining  Genome-Wide 
Deletions  and  Gains 

There  are  several  factors  that  are  known  to  pre¬ 
vent  accurate  estimates  of  DNA  CNs.  First,  recent 
studies  comparing  specific  loci  among  different 
individuals  have  reported  substantial  CN  variation 
in  germline  DNA  (Iafrate  et  al.,  2004;  Sebat  et  al., 
2004;  Sharp  et  ah,  2005;  Slater  et  ah,  2005;  Tuzun 
et  ah,  2005;  Conrad  et  ah,  2006;  Hinds  et  ah,  2006; 
Liu  et  ah,  in  press;  McCarroll  et  ah,  2006).  Second, 
random  noise  in  allele  intensity  at  each  SNP  may 
yield  unreliable  information  on  DNA  alterations  at 
the  region.  Third,  polymorphisms/mutations  in 
sequences  that  interfere  with  restriction  enzyme 
digestion  or  probe  hybridization  may  result, 
respectively,  in  longer  DNA  fragments  which  are 
likely  to  reduce  the  PCR  yield  of  this  region  or 
limit  hybridization  rates.  Ultimately  these  may 
lead  to  altered  allele  intensity  and  incorrect  inclu¬ 
sion  or  exclusion  of  CN  alterations. 

With  these  issues  in  mind,  we  then  set  out  to  es¬ 
tablish  overall  approaches  to  accurately  infer  puta¬ 
tive  deletions  and  gains  in  the  whole  genome 
based  on  the  DNA  CNs  of  the  1 16,204  SNP  probes 
in  the  100K  mapping  array.  To  minimize  the 
impact  of  individual  variability  in  germline  DNA 
CN  while  improving  our  detection  of  somatic  (ver¬ 
sus  germline)  alterations  in  the  cancer  genome,  we 
performed  simultaneous  SNP  analyses  of  tumor 
DNA  and  matched  normal  tissue  DNA,  and  then 
used  the  ratio  of  DNA  CNs  between  tumor  and 
normal  samples  as  the  primary  variable.  To  limit 
the  potential  for  artifactual  deletions  due  to  addi¬ 
tional  point  mutations  at  restriction  enzyme  sites 
that  result  in  fragments  that  are  too  long  for  con¬ 
sistent  PCR  amplification,  we  set  a  minimum  phys¬ 
ical  length  of  2  kb  for  putative  CN  alterations.  To 
reduce  random  noise  in  allele  intensity  at  individ¬ 
ual  SNPs,  we  estimated  DNA  CN  based  on  multi¬ 
ple  flanking  SNPs  in  the  region  using  the  default 
500-kb  setting  of  CNAT  software  to  perform  ge¬ 
nome  smooth  average  ropy  //umber  (GSACN). 
However,  we  also  used  single  />oint  copy  //umber 
(SPCN)  to  assist  GSACN  in  defining  alterations,  to 
search  for  unique,  high  resolution  information 
regarding  DNA  CN  at  each  specific  genomic  posi¬ 


tion.  Collectively,  these  approaches  minimize  fac¬ 
tors  known  to  influence  the  estimates  of  DNA 
CNs,  while  improving  the  accuracy  of  inferring 
DNA  alterations. 

Drawing  from  our  overall  approaches,  we  then 
experimented  with  a  set  of  initial  working  criteria 
to  define  putative  deletions  and  gains.  To  do  this, 
we  compared  the  results  of  identified  alterations 
from  these  different  criteria  based  on  which  ones: 
(1)  gave  the  highest  proportion  of  recurrent  altera¬ 
tions  (>2  subjects)  among  all  of  the  identified 
alterations,  and  (2)  could  be  100%  confirmed  by 
quantitative  real-time  PCR  analyses  in  eight  candi¬ 
date  regions  of  deletion  and  four  regions  of  gain. 
Based  on  these  comparisons,  we  then  selected  the 
working  criteria,  one  each  for  deletions  and  gains, 
for  use  as  the  primary  analyses  in  this  study.  For 
deletions,  the  working  criteria  are  a  minimum  four 
consecutive  SNPs  with  at  least  three  of  them  hav¬ 
ing  the  following  characteristics:  the  GSACN  and 
SPCN  ratios  of  tumor/matched  normal  <0.75;  the 
GSACN  of  the  tumor  DNA  <1.9  for  autosomal 
chromosomes,  or  <0.9  for  X  chromosome,  and  the 
minimum  physical  length  of  the  putative  deletion 
>2  kb.  For  gains,  the  working  criteria  are  a  mini¬ 
mum  four  consecutive  SNPs  with  at  least  three  of 
them  having  the  following  characteristics:  the 
GSACN  and  SPCN  ratios  of  tumor/match  normal 
>1.4;  the  GSACN  of  the  tumor  DNA  >2.7  for 
autosomal  chromosomes,  or  >1.7  for  X  chromo¬ 
some,  and  the  minimum  physical  length  of  the  pu¬ 
tative  gains  >2  kb. 

Comprehensive  Assessment  of  Deletions 
and  Gains  in  Prostate  Tumors 

We  applied  these  two  working  criteria  to  examine 
the  DNA  CN  data  of  100K  SNPs  among  22  paired 
tumor/normal  DNA  samples.  As  shown  in  Supple¬ 
mentary  Figure  1,  we  observed  several  large-scale 
CN  alterations  that  are  consistent  with  previous 
findings  from  cytogenetic  and  CGH  studies,  includ¬ 
ing  deletions  at  5q,  6q,  8p,  lOq,  12p,  13q,  and  16q 
and  gains  at  3,  7,  and  8q.  In  addition,  we  were  able 
to  detect  a  number  of  smaller  scale  CN  alterations. 
We  found  863  putative  deletions  and  495  putative 
gains  in  these  tumor  genomes,  of  which  355  dele¬ 
tions  (41%)  and  223  gains  (45%)  are  recurrent.  The 
chromosomal  band,  the  starting  and  ending  posi¬ 
tions,  the  number  of  SNP  probes  involved,  the  num¬ 
ber  of  tumors  involved,  and  the  known  genes  within 
the  regions  of  all  of  the  recurrent  alterations  are  pre¬ 
sented  in  Supplementary  Table  la  and  lb.  The 
altered  regions  where  at  least  four  tumors  were 
involved  are  presented  in  Table  2.  The  most  fre- 
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quently  deleted  regions  were  10q23.3  and  13q21.31, 
where  seven  tumors  showed  reductions  in  DNA  GN 
compared  to  their  normal  counterparts,  consistent 
with  both  hemizygous  and  homozygous  deletion 
(see  later).  The  other  commonly  deleted  regions 
were  3q26.33,  4q28.3,  and  12q21.32  among  six 
tumors,  and  6ql4.3,  8p22  (at  ~14  Mb),  8p22  (at 
~16  Mb),  Spll.l,  8pl  1.2,  16q22.1,  13q3J.l,  and 
16pl  1.2,  among  five  tumors.  The  most  frequently 
observed  gains  were  detected  on  llql3.5,  where 
seven  tumors  were  involved.  The  other  regions  with 
common  gains  were  7q22.1  and  16q22.1,  where  six 
tumors  were  involved,  and  lp36,  7p22.2,  and 
22ql3.31,  where  five  tumors  were  involved. 

Many  of  these  frequent  alterations  have  been 
previously  reported  by  various  methods.  In  fact,  all 
20  frequent  deletions  (>10%)  on  autosomal  chro¬ 
mosomes  described  in  the  review  by  Dong  (2001) 
were  observed  in  our  study  and  were  found  in  at 
least  two  of  our  samples.  In  addition,  we  identified 
many  regions  that  have  not  been  previously 
reported  as  frequent  changes.  For  example,  among 
the  43  regions  where  at  least  four  of  the  22  tumors 
(>18%)  were  found  in  our  study  to  contain  a  dele¬ 
tion  (Table  2),  seven  of  these  regions  were  novel, 
including  3q26.33,  4q32.2,  4q34.1,  5ql2.2,  6q24.3, 
9q31.1,  and  13q31.1.  As  expected,  the  sizes  of  these 
seven  novel  deleted  regions  were  smaller  (median 
size  of  206,587  bp)  than  that  of  37  previously 
known  deleted  regions  (median  size  of  392,609  bp). 
This  difference  in  size  was  statistically  significant, 
P  =  0.01  (nonparametric  rank  test),  thus  demon¬ 
strating  the  advantage  of  this  high-resolution 
method  in  detecting  smaller  alterations. 

Importantly,  we  found  significant  differences  in 
the  number  of  deletions  and  gains  among  tumors 
of  different  Gleason  scores  (Fig.  2,  Table  3).  For 
tumors  with  Gleason  6-8,  we  observed  a  trend  of 
more  DNA  CN  alterations  in  the  tumors  with 
higher  Gleason  sums.  For  example,  the  median 
numbers  of  deletions  were  17.5,  50,  and  205  for  the 
tumors  of  Gleason  6,  7,  and  8,  respectively.  The 
median  numbers  of  gains  were  1.5,  35,  and  94  for 
the  tumors  of  Gleason  6,  7,  and  8,  respectively.  It  is 
interesting  to  note  that  somatic  CN  alterations  are 
less  common  in  the  two  Gleason  9  tumors  in  our 
study.  An  average  of  25.5  deletions  and  no  gains 
were  found  in  these  two  subjects.  While  these 
observations  were  based  on  only  two  subjects,  this 
trend  appeared  to  hold  in  our  analysis  of  a  larger 
number  of  Gleason  9  samples  at  the  PTEN  locus 
(see  the  section  below  “PTEN  hemizygous  and 
homozygous  deletion  in  primary  prostate  tumors”). 
An  examination  of  the  tumor  histology  for  the 


specimens  from  which  these  two  DNA  samples 
were  isolated  indicates  that  this  lower  frequency  of 
CN  alterations  was  not  due  to  relatively  lower  tu¬ 
mor  purity  of  these  samples  (data  not  shown).  Fur¬ 
ther  studies  in  a  larger  number  of  samples  are 
needed  to  obtain  a  better  estimate  of  genomic  GN 
alterations  in  the  tumors  of  Gleason  9  or  higher. 

Genes  Implicated  in  the  Regions 
of  DNA  CN  Alterations 

While  multiple  recurrent  DNA  CN  alterations 
were  located  between  genes,  more  than  half  of 
them  (58%)  involved  genes,  either  completely  or 
in  part  (Table  4).  Specifically,  the  vast  majority 
(74%)  of  the  regions  with  recurrent  GN  gains 
involved  genes;  571  known  genes  were  located  in 
the  223  recurrent  gained  regions.  In  contrast,  only 
about  one  half  (49%)  of  the  recurrent  deleted 
regions  involved  genes;  459  known  genes  were 
located  in  the  355  recurrent  deleted  regions.  The 
difference  in  the  proportion  of  alterations  involving 
genes  between  gain  and  loss  events  was  statisti¬ 
cally  significant  (Z  =  6.00,  P  <  0.000001). 

To  further  test  whether  the  DNA  CN  alterations 
preferentially  target  genes,  we  compared  the  pro¬ 
portion  of  the  altered  SNP  probes  that  are  located 
within  genes  with  that  of  all  of  the  SNP  probes  on 
the  100K  array.  Among  the  116,204  SNPs,  41,959, 
corresponding  to  36%,  are  located  within  genes,  as 
defined  by  the  termini  of  the  5'  and  3'UTRs.  In 
comparison,  both  recurrent  gains  (74%,  Z  =  11.76, 
P  <  0.00001)  and  deletions  (49%,  Z  =  4.94,  P  < 
0.00001)  were  significantly  more  likely  to  occur 
within  genes  than  the  average  in  the  genome 
(36%).  When  the  proportions  of  SNPs  that  are 
located  within  genes  are  estimated  separately  for 
alterations  as  a  function  of  their  frequency  of 
occurrence  in  multiple  tumors,  we  found  that  the 
proportions  were  higher  for  SNPs  involved  in  alter¬ 
ations  that  were  observed  >4  subjects,  significantly 
higher  than  the  average  in  the  genome  (36%);  45% 
for  the  deletions  (Z  =  2.65,  P  =  0.008),  and  47% 
for  the  gains  (Z  =  2.90,  P  =  0.004).  These  results 
indirectly  suggest  that  recurrent  CN  alterations, 
both  gain  and  loss,  preferentially  target  gene-con¬ 
taining  intervals,  consistent  with  a  role  for  selec¬ 
tion  of  cells  with  increasing  numbers  of  specific 
gene  dosage  alterations  as  prostate  cancers  initiate 
and  progress. 

PTEN  Hemizygous  and  Homozygous  Deletion 
in  Primary  Prostate  Tumors 

The  high-resolution  SNP  array  provides  an 
excellent  tool  to  better  define  altered  regions 
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TABLE  2.  Putative  Deleted  and  Amplified  Regions  Identified  in  22  Primary  Tumors  Using  IOOK  SNP  Mapping  Panel 


Implicated  region  (bp) 

No.  of 
implicated 

Known  genes 
in  the  regions  (UCSC) 

Chromosomal  band 

Start 

End 

Size 

SNPs  SNPs 

Deleted  regions 
I0q23.2-23.3I 

89,308,314 

90,653,819 

1,345,505 

84 

7 

PAPSS2,  PTEN,  ATADI,  CI0orf59, 

I3q2l.3l 

60,860,097 

62,682,287 

1,822,190 

81 

7 

LIPF,  ANKRD22 

PCDH20 

3q26.33 

182,126,289 

182,248,076 

121,787 

5 

6 

FXRI,  DNAJCI9 

4q28.3 

136,423,989 

136,677,252 

253,263 

16 

6 

_ 

1 2q2 1 .32 

86,008,606 

86,199,128 

190,522 

10 

6 

_ 

8pl  i.l 

43,218,212 

43,312,864 

94,652 

6 

5 

POTE8 

6ql4.3 

86,797,435 

87,171,740 

374,305 

14 

5 

_ 

8p22 

14,201,821 

14,903,851 

702,030 

77 

5 

SGCZ,  TUSC3 

8p22 

16.181,469 

16,758,362 

576,893 

38 

5 

— 

8pl2-l  1.21 

37.608,457 

40,420,045 

2,811,588 

60 

5 

ZNF703,  SPFH2,  PROSAC, 

13q3  1. 1 

80,591,525 

80,71 1,441 

119,916 

10 

5 

GPRI24,  BRF2,  RABI  IFIPI, 
ADRB3,  EIF4EBPI,  ASH2L, 

STAR,  LSMI,  BAG4,  DDHD2, 
PPAPDC 1 B,  WHSC 1 L 1 ,  LETM2, 
AF 173898,  FGFRI,  FLJ43582, 
TACC 1 ,  HTRA4,  TM2D2, 
ADAM9,  ADAM32,  AKI298I0, 
BC026083,  BC047448, 
BC067864,  AK 128 178, 

ADAM  1 8,  ADAM2,  INDO, 
INDOLI,  C8orf4 

1 6q22. 1 

67,987,772 

68,461,639 

473,867 

8 

5 

CYB5B,  NFATS,  NQOI,  NOBI P, 

I6q22.3-23.l 

72,745,821 

73,507,260 

761,439 

19 

5 

WWP2 

PSMD7,  AK  1 24 1 54.  LOC497 1 90, 

2q22.2 

143,455,91 1 

143,574,902 

118,991 

5 

4 

MGC3476 1 ,  LOC348 1 74, 

GLGI,  AKI3I50I,  RFWD3, 
FA2H.  WDR59 

KYNU 

1 7q2 1 .3 1 

39,876,004 

40,523,389 

647,385 

5 

4 

KIAA0553,  FZD2,  CCDC43, 

4q34. 1 

176,402,528 

176,589,178 

186,650 

6 

4 

DBF4B,  AY358I0I,  ADAM  1  1 , 
GJA7,  HIGDIB,  EFTUD2, 
LOC388389,  GFAP,  AK124465, 
CIQLI,  DCAKD,  NMTI 

5ql2.2 

63,386,968 

63,692,865 

305,897 

5 

4 

RNFI80,  BCI 01279 

3pl2.2 

83,456,290 

83,694,096 

237,806 

7 

4 

— 

4q32.2 

163,606,393 

163,880,053 

273,660 

16 

4 

_ 

5q2 1 .3 

104,253,771 

105,557,038 

1,303,267 

29 

4 

_ 

5q23.l 

119,189,176 

1  19,485,637 

296,461 

10 

4 

_ 

6ql4. 1 

80,611,908 

80,870,799 

258,891 

18 

4 

ELOVL4,  TTK 

6q  1 6. 1 

95,413,900 

96,043,261 

629,361 

15 

4 

_ 

6q2l 

1 10,047,191 

110,226,824 

179,633 

14 

4 

C6orfl99,  KIAA0274 

6q24.3 

146,164,446 

146,394,383 

229,937 

6 

4 

FBXO30,  SHPRH,  GRMI 

8p23.2 

4,852,294 

4,993,234 

140,940 

12 

4 

— 

8p22 

15,060,320 

15,328,979 

268,659 

14 

4 

_ 

8p22 

15,378,001 

15,700,057 

322,056 

25 

4 

TUSC3 

8p22 

17,855,816 

18,169,577 

313,761 

23 

4 

PCMI,  ASAFil,  NATI 

8p2 1 .2 

27,633,311 

28,212,897 

579,586 

30 

4 

FLJ 1 0853.  SCARA3,  MGC45780, 

8pl2 

32,318,980 

32,632,781 

313,801 

32 

4 

PBK,  ELP3,  PNOC 

NRGI 

8pl2 

35,009,528 

35,271,112 

261,584 

14 

4 

_ 

9q3l  .1 

102,478,986 

102,685,573 

206,587 

9 

4 

_ 

1 3  q  1 4. 1 2 

45,310,109 

45,731,718 

421,609 

26 

4 

KIAA0853,  CPB2.LCPI, 

1 3q2 1 . 1 

52.895,341 

53,484,638 

589,297 

36 

4 

LOC2204 1 6 
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TABLE  2.  Putative  Deleted  and  Amplified  Regions  Identified  in  22  Primary  Tumors  Using  1 00K  SNP  Mapping  Panel  (Continued) 


Implicated  region  (bp) 

No.  of 
implicated 

Known  genes 
in  the  regions  (UCSC) 

Chromosomal  band 

Start 

End 

Size 

SNPs  SNPs 

I3q2!.l 

54,585,799 

54,780,048 

194,249 

15 

4 

_ 

1 3q2 1 . 1 

56,263,149 

57,516,869 

1,253,720 

42 

4 

FLJ40296,  PCDH 1 7 

I3q2l.32 

65,751,760 

66,201,555 

449,795 

36 

4 

PCDH9 

1 3q2 1 .33 

70,000,334 

70,41  1,247 

410,913 

24 

4 

— 

1 3  q2 1 .33 

70.639,393 

71,364,849 

725,456 

45 

4 

DACHI 

1 3q2 1 .33 

71,438,158 

71,624,921 

186,763 

14 

4 

— 

I3q22.l 

72,183,522 

72,836,342 

652,820 

28 

4 

AK025522,  FLJ22624,  AK09S4I0, 

Kl AA 1 008,  C 1 3orf24,  KLF5 

1 6q23.  [ 

Amplified  regions 

74,948.793 

75,095,527 

146,734 

18 

4 

CNTNAP4 

1  Iql  3.5 

75,434,322 

75,922,141 

487,819 

6 

7 

UVRAG,  WNTf  1,  PRKRIR, 
BC040665,  LOC387790,  EMFY 

7q22.l 

101,238,531 

101,259,615 

21,084 

4 

6 

CULTI 

I6q22.l 

66,582,35 1 

67,026,799 

444,448 

7 

6 

DPEP2,  CR625664,  DOX28.DUS2L, 
NFATC3,  RBM35B,  LYPLA3, 
SLC7A6,  SLC7A60S,  PRMT7, 

AK 1 23945,  SMPD3,  AK 1 28261 

Ip36 

2,960,027 

3,326,028 

366,001 

12 

5 

PRDMI6 

7p22.2 

2,545,672 

2,779,870 

234,198 

12 

5 

C70rf27,  IQCE,  TTYH3,  AM2I, 

GAN  12 

22q  13.31 

42,873,845 

43,132,573 

258,728 

6 

5 

PARVB,  PARVG,  BC 104 183 

7q36.3 

154,307,728 

154,780,169 

472,441 

6 

4 

DPP6,  PAXIPI,  HTR5A,  INSIGI 

8q  1 3.3 

71,028,222 

71,500,739 

472.517 

22 

4 

FROM  14,  NCOA2 

1  p36.3 1 

5941712 

6666834 

725 1 22 

7 

4 

NPNP4,  KCNAB2,  CHDS, 
AK0942I9,  RPL22,  FLJ46380, 
BC034459,  C 1  orf  1 88.  AK  1 28450, 
ICMT,  MGC40I68,  HES3, 

GPRI 53,  ACOT7,  AY358 1 79, 
HES2,  ESPN,  TNFRSF25, 
PLEKHG5,  NOL9, TASIRI, 

HKR3,  KLHL2I,  PHFI3,  THAP3, 
DNAJCI 1 

Iq42.3 

232938058 

233126758 

188700 

17 

4 

AF 193050 

3q22.3 

139573887 

140078942 

505055 

II 

4 

MARS,  FAM62C,  CEP70,  FAIM, 
PIK3CB 

3q26.2 

170988843 

17 16591 51 

670308 

12 

4 

LRRC34,  LRRC3I.SAMD7,  TLOCI, 
GPRI 60,  PHC3,  AK095225, 

PRKCI,  SKIL,  CLDN 1 1 

3q26.32 

177666873 

178105323 

438450 

13 

4 

— 

5p  15.33 

3101932 

3189229 

87297 

7 

4 

— 

Sq35.3 

1802261 1  1 

180607628 

381517 

8 

4 

BTNL8,  BTNL3,  BTNL9,  TRIM7, 
TRIM4I,  AX775803,  AX775797, 
AX775789,  AX77579 1 ,  GNB2L 1 

7pf5.2 

25276343 

25599789 

323446 

29 

4 

— 

7p  1 2.3 

47225158 

47832160 

607002 

13 

4 

TNS3,  AK  126096,  BC007354, 

PKDI  LI,  FL2I075 

7pl  1.2 

56479844 

57174263 

694419 

7 

4 

LOC40I357 

7ql  1.21 

62940814 

63362316 

421502 

1  1 

4 

ZNF679 

7q  11.23 

76478717 

76636664 

157947 

10 

4 

LOC38957 1 ,  Kl  AA  1 505 

7q2 1.12 

866477 1 7 

87296218 

648501 

22 

4 

DMTFi,  C7orf23,  TP53API ,  CROT, 
ABCB4,  ABCBI,  RPIB9 

7q36.3 

157034271 

158246990 

1212719 

8 

4 

PTPRN2,  AK  1 26705,  AK057320, 

AK  127222,  LUZP5,  FAM62B 

8q  1  1 .23 

53174241 

53532128 

357887 

21 

4 

STI8 

8q  1 2.3 

64607420 

64947590 

340170 

II 

4 

— 

8q  1 3.2 

69108760 

69369253 

260493 

21 

4 

DEPDC2 

(Continued) 
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TABLE  2.  Putative  Deleted  and  Amplified  Regions  Identified  in  22  Primary  Tumors  Using  1 00K  SNP  Mapping  Panel  (Continued) 


Implicated  region  (bp) 

No.  of 
implicated 

Known  genes 
in  the  regions  (UCSC) 

Chromosomal  band 

Stare 

End 

Size 

SNPs 

SNPs 

8q2 1 . 1  1 

75495378 

75880359 

384981 

12 

4 

_ _ 

8q22. 1 

9691871 1 

97386585 

467874 

14 

4 

GDF6,  UQCRB,  MTERFDI,  PTDSS1 

8q24.2 1 

12751 1831 

128258556 

746725 

48 

4 

FAM84B 

8q24.2 1 

129018405 

129501894 

483489 

34 

4 

TMEM75 

8q24.22 

134424597 

135082614 

658017 

31 

4 

ST3GLI ,  AK0022I0 

8q24.22-24.23 

136301294 

136599508 

298214 

8 

4 

KHDRBS3 

i  Op  15.3 

284953 

737115 

452162 

9 

4 

2MYNDI  1  ,DQ3354S5,  DIP2C, 

AK 1 30224,  C 1  Oorf  1 06, 

AK0960I3 

20q  13.31 

54998112 

55225345 

227233 

7 

4 

BMP7 

G  grade  P  stage 


Figure  2.  Whole-genome  comparison  of  genomic  aberrations  among  prostate  tumors  with  different 
Gleason  scores  at  different  pathological  stages.  Chr,  chromosome.  Green  arrow,  deletion.  Red  arrow, 
amplification.  Only  graphically  obvious  aberrations  are  marked.  [Color  figure  can  be  viewed  in  the  online 
issue,  which  is  available  at  www.interscience.wiley.com.] 


shared  by  multiple  tumors,  thereby  improving  the 
ability  to  identify  specific  genes  that  may  be  driv¬ 
ing  the  selection  of  these  alterations.  We  examined 
the  genes  implicated  at  10q23,  the  most  commonly 
deleted  region  in  the  current  study  population. 
Although  the  entire  affected  region  spanned  ~1.3 
Mb,  only  two  in-gene  SNP  probes,  spanning 
10,053  bp,  were  deleted  in  each  of  these  seven 
tumors  (Figs.  3A-G).  PTEN  was  the  only  gene 
residing  within  these  two  SNPs. 

To  confirm  the  deletion  status  at  PTEN,  we  per¬ 
formed  a  qPCR  analysis  for  these  22  pairs  of  tumor 
and  normal  DNA  samples  using  a  probe  at  position 
89,675,748  bp,  between  the  two  implicated  SNPs. 


The  results  from  the  qPCR  analysis  were  consist¬ 
ent  with  that  from  Affymetrix  SNP  mapping 
method.  'The  differences  in  the  Ct  values  (ACt) 
between  tumor  and  matched  normal  among  these 
subjects  were  >0.51,  suggesting  a  PTEN  deletion 
in  these  tumors.  Among  the  7  PTEN  deleted 
tumors,  four  were  Gleason  8,  two  were  Gleason  7, 
and  one  was  Gleason  6  (Fig.  3). 

To  better  estimate  the  frequency  of  the  PTEN 
deletion  in  primary  prostate  tumors,  we  performed 
qPCR  analysis  in  another  population  of  69  tumors 
that  had  not  been  analyzed  by  the  100K  SNP  array. 
Among  a  total  of  91  primary  prostate  tumors  exam¬ 
ined,  32  tumors  (32/91  =  35%)  had  the  deletion, 
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TABLE  3.  DNA  Copy  Number  Alterations  Among  Tumors  with  Different  Gleason  Sum 


Gleason  sum 

No.  of 
subjects 

Median  copy  number  alterations 

Median  recurrent  copy  number 
alterations 

Deletions 

Gains 

Both 

Deletions 

Gains 

Both 

6 

8 

I7.S 

1.5 

26.5 

5.5 

0.5 

7 

7 

8 

SO 

35 

74 

25 

15 

44 

8 

4 

205 

94 

299 

76.5 

41.5 

118 

9 

2 

25.5 

0 

25.5 

2 

0 

2 

P-value1 

0.04 

0.006 

0.02 

0.005 

0.01 

0.01 

“Kruskal-Wailis  test  was  performed  to  test  if  there  was  a  statistically  significant  difference  of  number  of  genetic  alterations  among  the  four  Gleason 
score  groups. 


TABLE  4.  Proportions  of  Recurrent  Alterations 
Involving  Genes 


Type  of 
alterations 

No.  of 
alterations 

No.  (%)  of 
alterations 
involving  genes 

No.  of  genes 
involved  in 
alterations 

Deletions  observed  in 

>4  subjects 

44 

25  (56.82%) 

121 

3  subjects 

68 

36  (52.94%) 

no 

2  subjects 

243 

1  12  (46.09%) 

228 

Sub  total 

355 

173  (48.73%) 

459 

Gains  observed  in 

>4  subjects 

33 

28  (84.84%) 

127 

3  subjects 

48 

42  (87.50%) 

129 

2  subjects 

142 

95  (66.90%) 

315 

Sub  total 

223 

1 65  (73.99%) 

571 

Total 

578 

338  (58.47%) 

1,030 

defined  as  ACt  >  0.51  (Table  5).  Importantly,  we 
continued  to  observe  higher  percentages  of  PTEN 
deletion  in  the  tumors  of  Gleason  8  (7/10  =  70%) 
and  Gleason  7  (21/51  =  41%),  compared  with  the 
tumors  of  Gleason  6  (2/18  =  11%).  Interestingly, 
consistent  with  the  overall  pattern  of  fewer  somatic 
CN  deletions  and  gains  in  two  Gleason  9  tumors 
observed  in  our  study,  the  PTEN  deletion  was  rela¬ 
tively  uncommon  in  the  12  Gleason  9  tumors,  as 
only  two  subjects  had  a  deletion  (2/12  =  17%). 

We  attempted  to  further  differentiate  homozy¬ 
gous  deletions  from  hemizygous  deletions  based 
on  the  ACt  of  the  qPCR  analysis  between  tesced 
tumor  and  normal  DNA  samples.  Because  of  vari¬ 
ous  levels  of  normal  DNA  contamination  in  these 
macrodissected  primary  tumor  DNA  samples,  the 
classification  of  homozygous  and  hemizygous  dele¬ 
tions  based  on  ACt  was  subject  to  errors.  However, 
assuming  a  minimum  of  25%  normal  DNA  contam¬ 
ination  in  our  tumor  DNA,  the  maximum  ACt  for  a 
hemizygous  deletion  is  0.68.  Therefore,  tumors 
with  ACt  >  0.68  were  consistent  with  a  homozy¬ 
gous  deletion.  Among  the  32  PTEN  deleted 
tumors,  13  could  be  classified  as  homozygous  dele¬ 
tions  (Table  5,  Fig.  4).  Interestingly,  all  of  the 


PTEN  homozygous  deletions  were  found  in  Glea¬ 
son  8  and  7  tumors,  including  six  Gleason  8  tumors 
(6/10  ~  60%)  and  seven  Gleason  7  tumors  (7/51  = 
14%).  None  of  the  eighteen  Gleason  6  tumors  were 
classified  as  homozygous  deletions,  and  neither 
were  any  of  the  twelve  Gleason  9  tumors. 

Common  Deletions  Between  ERG  and  TMPRSS2 

The  high-resolution  SNP  array  also  provides  an 
excellent  tool  for  discovering  novel  alterations. 
One  novel  region  of  frequent  recurrent  deletion,  at 
21q22.2,  was  of  particular  interest.  Using  500K 
GSACN,  we  found  that  six  tumors  had  apparent 
deletions  between  38  Mb  and  41.7  Mb,  corre¬ 
sponding  approximately  to  the  interval  between 
the  genes,  ERG  and  TMPRSS2 ,  recently  demon¬ 
strated  to  be  involved  in  common  gene  fusion 
events  in  prostate  cancer  (Tomlins  et  al.,  2005). 
The  GSACN  ratio  of  tumor/normal  of  these  six 
subjects  is  presented  in  Figure  5.  On  the  basis  of 
the  data  from  GSACN  and  SPCN  analyses,  the 
boundaries  of  the  deletions  at  the  telomeric  side 
appear  to  be  within  a  small  region,  at  least  among 
five  of  the  six  tumors,  involving  four  SNPs  and 
spanning  217,769  bp.  While  the  telomeric  bound¬ 
ary  may  reside  within  TMPRSS2 ,  it  is  difficult  to 
infer  the  exact  breakpoint  because  no  SNP  on  the 
100K  array  resides  within  TMPRSS2.  In  contrast, 
the  boundaries  of  the  deletions  on  the  centromeric 
side  appear  to  vary  among  the  six  tumors,  involving 
99  SNPs  that  span  1,119,680  bp.  The  deletion 
boundaries  in  three  tumors  are  consistent  vvith 
breakpoints  within  the  ERG. 

DISCUSSION 

DNA  CN  alterations  may  provide  important 
clues  in  identifying  tumor  suppressor  genes  and 
oncogenes.  The  ability  to  identify  causal  genes  in 
the  altered  DNA  CN  regions  largely  depends  on 
the  size  of  altered  region,  which  is  partially  related 
to  the  resolution  of  methods  in  detecting  the  DNA 
CN  alterations.  In  this  study,  we  demonstrated  that 
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Figure  3.  Analysis  of  DNA  copy  num¬ 
ber  alterations  on  Chromosome  10.  (A) 
scatter  plot  of  GSACN  ratio  of  tumor/nor¬ 
mal  at  each  SNP  locus  from  89.4  to  90.7 
Mb  to  display  the  alterations  of  DNA  copy 
numbers  at  1 0q23  among  22  prostate  can¬ 
cer  subjects,  each  of  which  is  labeled  with 
Gleason  score-tumor  ID.  Horizontal  bars 
indicate  the  physical  positions  of  PASS2, 
PTEN  and  LIPF.  (B)  putative  deletions 
detected  in  10  tumors.  Each  deletion  is 
plotted  as  vertical  bar  at  the  specific  physi¬ 
cal  location  on  chromosome  10.  (C)  over¬ 
lapping  analysis  of  recurrent  deletions  from 
89.3  to  90. 1  Mb  on  the  physical  map  among 
seven  subjects.  Black  vertical  bar  represents 
the  location  of  each  SNP.  Deleted  SNPs  are 
high-lighted  in  red.  Horizontal  bars  indicate 
the  physical  positions  of  PASS2  and  PTEN. 
Dot  vertical  lines  mark  the  maximum  over¬ 
lapping  region  deleted  within  PTEN  with 
arrows  indicating  the  size  of  the  overlap¬ 
ping  deletion.  [Color  figure  can  be  viewed 
in  the  online  issue,  which  is  available  at 
www.i nterscience.wiley.com.] 


Affymetrix  100K  SNP  mapping  array,  with  an  aver¬ 
age  resolution  of  one  SNP  in  24  kb,  provide  a  high 
resolution  method  to  detect  DNA  CN  alterations 
in  the  tumor  genome.  We  obtained  several  impor¬ 
tant  findings  in  our  study  by  using  this  new 
method  to  examine  22  pairs  of  primary  prostate 


cancers  and  matched  normal  tissues.  We  detected 
355  recurrent  deletions  and  223  recurrent  gains  in 
these  tumor  genomes,  many  of  which  were  novel, 
particularly  those  of  smaller  sizes  alterations.  We 
found  significantly  higher  numbers  of  DNA  CN 
alterations  in  tumors  with  higher  Gleason  scores 
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TABLE  5.  Putative  Homozygous  and  Heterozygous  PTEN  Deletions 


Gleason  sum 

No.  of  subjects 

No.  (%)  of  PTEN  deletion  status 

Normal 

Hemizygous 

Homozygous 

6 

18 

16(88.89) 

2(11.11) 

0(0) 

7 

51 

30  (58.82) 

14(27.45) 

7(13.73) 

8 

10 

3  (30.00) 

1  (10.00) 

6  (60.00) 

9 

12 

10(83.33) 

2(16.67) 

0(0) 

Total 

91 

59  (64.83) 

22  (24,18) 

10(10.99) 

*  1357T  x  1122  T  *  1t»T  *  990  T  «  I1HT  *  410T  *  332 T  x  OWT 

•  *9ST  HIST  ♦  4J3T  - LlnMr(l067  T) - LfcKif  (M22T) - Un«*r(11»T) - Ua Hr  (MOT)  - Linear  (11SJT) 

—  U«ar  (813T)  linear  {M2  7}  -^-UinirffiftT)  UnMr<MBT)  Uo*ar<t115T) - Ui»*r{4J9  7} 


homozygous  deletion 


hemizygous  deletion 
normal  copy  numbor 


27  28 


Figure  4.  Validating  PTEN  deletion  by  real-time  quantitative  PCR. 
Ct  values  of  the  control  (x-axis)  and  test  (y-axis)  amplicons  for  the 
three  dilutions  of  each  DNA  sample  were  plotted  against  each  other, 
and  the  offsets  between  best-fit  lines  for  the  samples  along  the  test- 
amplicon  axis  at  25  Ct  of  the  control-amplicon  axis  were  measured. 
The  offsets  in  the  Ct  values  (ACt)  between  tumor  DNAs  with  putative 


PTEN  deletions  and  tumor  DNAs  with  known  normal  PTEN  are  used 
to  infer  DNA  copy  number.  A  tumor  DNA  is  defined  as  having  PTEN 
hemizygous  deletion  when  the  ACt  is  less  than  0.68  and  as  having  PTEN 
homozygous  deletion  when  the  ACt  is  more  than  0.68,  assuming  25% 
normal  DNA  contamination.  [Color  figure  can  be  viewed  in  the  online 
issue,  which  is  available  at  www.lntersdence.wiley.com.] 


among  tumors  with  Gleason  sums  6,  7,  and  8.  We 
observed  that  most  frequent  DNA  CN  alterations 
of  both  types  (deletion  and  gain)  preferentially 
occurred  within  genes.  Finally,  we  observed  a 
novel  and  recurrent  deletion  between  the  ERG 
and  TMPRSS2  genes  on  chromosome  21,  presum¬ 
ably  related  to  the  recently  identified  formation  of 
fusion  transcripts  from  these  two  genes.  These 
findings  demonstrated  the  advantages  of  this  high- 
resolution  method  in  detecting  DNA  CN  and  pro¬ 
vided  important  clues  for  further  classifying  heter¬ 
ogeneous  prostate  tumors  and  identification  of 
genes  important  in  tumorigenesis. 

Another  important  feature  of  our  study  is  chat 
we  demonstrate  the  feasibility  of  SNP  mapping 
arrays  in  detecting  CN  alterations  in  DNA  isolated 


from  macrodissected  tumor  tissues.  While  micro- 
dissection  of  tissue  samples  using  techniques  such 
as  laser  capture  have  an  advantage  of  yielding 
potentially  more  homogeneous  samples,  with  little 
contamination  from  nonneoplastic  cells,  this 
approach  provides  only  limited  amounts  of  DNA 
available  for  repeat  and  subsequent  follow  up  anal¬ 
yses,  and  the  results  obtained  reflect  only  the  char¬ 
acteristics  of  the  small  number  of  cells  analyzed. 
While  being  more  susceptible  to  contamination  by 
normal  cells,  the  macrodissection  method  used  in 
this  study  provides  a  larger  quantity  of  DNA,  and 
reflects  the  average  genetic  makeup  of  a  much 
larger  number  of  cells.  Thus  while  CN  alterations 
that  may  be  specific  for  a  small  subset  of  cancer 
cells  may  not  be  detected,  our  detection  of  hun- 
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Figure  5.  Analysis  of  DNA  copy  number  alterations  on  Chromosome  2 1 .  The  tumor/normal  ratios  of 
GSACN  at  each  SNP  locus  from  38  to  43  Mb  are  plotted  to  illustrate  the  alterations  of  DNA  copy  numbers 
in  six  tumors.  The  tumor/normal  ratios  of  GSACN  of  many  SNPs  in  the  regions  between  the  two  dot  verti¬ 
cal  lines  indicate  deletions  between  ERG  and  TMPRSS2  whose  positions  are  marked  as  horizontal  bars. 
[Color  figure  can  be  viewed  in  the  online  issue,  which  is  available  at  www.interscience.wiley.com.] 


dreds  of  gains  and  deletions,  affecting  both  docu¬ 
mented  and  novel  loci,  emphasizes  the  usefulness 
of  this  approach  to  efficiently  identify  and  charac¬ 
terize  recurrent  CN  changes  present  in  the  major¬ 
ity  of  tumor  cells  within  a  given  prostate  cancer 
lesion.  To  address  the  question  of  genetic  hetero¬ 
geneity  within  prostate  cancers,  additional  studies 
using  high  resolution  SNP  arrays  to  characterize 
multiple  DNA  samples  isolated  from  separate, 
small  cell  populations  are  needed  and  should  be 
highly  informative. 

It  is  interesting  to  note  that  we  did  not  use  geno¬ 
type  information  of  the  SNP  arrays  to  perform 
LOH  analysis.  Theoretically,  the  ability  to  compare 
SNP  genotypes  of  matched  tumor  and  normal 
DNA  is  an  advantage  of  our  approach,  and  LOH  in¬ 
formation  is  critical  in  defining  deletions  and  allelic 
imbalance  alterations.  However,  in  practice  we 
found  that  Affymetrix  SNP  genotyping  is  extremely 
sensitive  and  is  able  to  detect  very  small  numbers 
of  alleles;  therefore  this  approach  is  not  suitable  for 
LOH  analyses  in  studies  such  as  ours,  where  there 
may  be  various  amounts  of  normal  DNA  contami¬ 
nation  from  macrodissected  tumor  DNA. 

Although  we  have  demonstrated  the  utilities  of 
100K  SNP  mapping  array  in  detecting  DNA  CN 
alterations  in  the  genome,  there  remain  many  chal¬ 
lenging  issues  regarding  the  accuracy  of  this 
method.  One  of  the  most  important  practical  issues 


is  the  quality  of  allele  intensity  data  of  SNP 
probes,  the  basis  for  estimating  DNA  CN.  The 
quality  of  allele  intensity  data,  however,  can  be 
indirectly  measured  by  SNP  call  rate.  A  low  SNP 
call  rate,  for  example  <95%,  is  an  indication  of 
poor  quality  of  allele  intensity  and/or  contamina¬ 
tion  of  other  DNA  sources,  and  therefore  are  not 
suitable  for  the  analysis  of  DNA  CN  (data  not 
shown).  We  therefore  did  not  use  any  data  from 
the  arrays  with  a  SNP  call  rate  <95%  in  this  study. 
Other  issues  that  may  affect  the  accuracy  of  infer¬ 
ring  DNA  CN  in  tumors  include  potential  germ¬ 
line  CNPs,  mutations  in  the  restriction  enzyme 
sites,  and  random  noise  of  allele  intensity  at  some 
SNPs.  We  have  taken  several  measures  to  mini¬ 
mize  the  effects  of  these  issues.  We  used  paired  tu¬ 
mor  and  matched  normal  DNA  and  used  their  ratio 
of  DNA  CNs  as  the  primary  variable  to  lessen  the 
effect  of  potential  germline  CNPs.  To  limit  the 
potential  for  artifact  deletions  due  to  additional 
point  mutations  at  restriction  enzyme  sites  that 
result  in  fragments  that  are  too  long  for  consistent 
PCR  amplification,  we  set  a  minimum  physical 
length  of  2  kb  for  putative  CN  alterations.  We  use 
both  GSACN  and  SPCN  in  defining  CN  alterations 
to  balance  the  unique  information  the  random  noise 
of  allele  intensity  at  each  SNP.  However,  these  cau¬ 
tious  steps  are  not  sufficient  to  ensure  the  accuracy 
of  this  method  in  inferring  DNA  CN.  Independent 
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confirmations  using  other  methods  such  as  qPGR 
analysis  are  helpful  to  reduce  false  positive  finding 
of  DNA  CN  alterations. 

The  high  resolution  of  the  100K  SNP  array  facil¬ 
itates  the  identification  of  smaller  size  of  CN  alter¬ 
ations.  The  fact  that  novel  DNA  CN  alteration 
regions  identified  in  this  study  are  significantly 
smaller  in  size  compared  with  the  previously  know 
altered  regions  demonstrated  the  advantage  of  high 
resolution.  The  fine  resolution  of  SNP  arrays  also 
allows  us  to  distinguish  between  different  altera¬ 
tions  within  a  small  chromosomal  region.  For  ex¬ 
ample,  we  detected  several  deletions  in  the  regions 
of  13q21.31,  13q21.1, 13q21.32,  and  13q21.33.  These 
different  deletions  might  be  detected  as  a  single  de¬ 
letion  using  lower-resolution  methods.  Therefore, 
this  increased  resolution  should  be  considered  when 
comparing  the  alteration  frequencies  obtained  from 
this  study  versus  those  estimated  from  lower  resolu¬ 
tion  methods.  Furthermore,  the  high  resolution  of 
this  method  facilitates  the  identification  of  specific 
genes  that  may  be  driving  the  selection  of  these 
alterations  and  play  roles  in  tumorigenesis.  The  fact 
that  we  can  pinpoint  two  SNPs  that  are  commonly 
implicated  in  all  seven  tumors  in  the  10q23  region 
and  PTEN  is  the  only  genes  residing  within  these 
two  SNPs  provides  an  excellent  example. 

The  rate  of  PTEN  deletion  was  observed  to  in¬ 
crease  as  the  Gleason  score  of  the  tumors  increase 
from  6  to  8,  which  is  in  consistent  with  the  reports 
that  PTEN  loss  and  reduction  of  its  expression  was 
highly  correlated  with  tumor  of  high  Gleason  score 
at  more  advance  stages  (McMenamin  et  ah,  1999; 
Dreher  et  ah,  2004;  Majumder  and  Sellers  et  ah, 
2005).  It  is  somewhat  surprising  that  PTEN  deletion 
was  significantly  lower  in  Gleason  9  tumors  with 
only  17%  in  comparison  to  that  in  Gleason  7  and  8 
tumors  with  41  and  70%  respectively  (Table  5).  In 
the  study  of  Gleason  score  distribution  of  chromo¬ 
somal  aberrations  in  prostate  cancer  using  CGH, 
Chu  et  ah,  (2003)  found  that  the  frequency  of 
10q25-qter  deletion  was  also  much  lower  in  Glea¬ 
son  6  and  9  tumors  with  22.2  and  16.7%,  respec¬ 
tively,  in  comparison  to  that  in  Gleason  7  and  Glea¬ 
son  8  tumors  with  45.8  and  33.3%  respectively. 

The  fusion  of  TMPRSS2  and  ERG  at  high  fre¬ 
quency  has  been  reported  in  prostate  cancer  (Tom¬ 
lins  et  ah,  2005;  Soller  et  ah,  2006).  The  region 
between  'TMPRSS2  and  ERG  represents  one  of  the 
novel  frequently  deleted  regions  in  prostate  tumors 
that  we  analyzed  (Fig.  5,  Supplementary  Fig.  1, 
Chr21).  It  is  interesting  to  note  that  we  did  not  find 
evidence  of  the  deletion  between  TMPRSS2  and 
ERG  in  any  of  the  22  normal  DNA  samples,  indi¬ 


cating  that  this  deletion  is  tumor  specific  and  so¬ 
matic  in  origin.  That  the  boundaries  of  the  dele¬ 
tions  on  the  centromeric  side  appear  to  vary  among 
the  different  tumors  is  consistent  with  a  recent 
finding  of  multiple  transcripts  with  different  sizes 
in  different  prostate  adenocarcinoma  samples 
(Soller  et  ah,  2006).  Although  our  results  suggest 
that  somatic  deletions  may  be  one  of  the  mecha¬ 
nisms  contributing  to  the  commonly  observed  ERG 
and  TMPRSS2  gene  fusions,  further  studies,  using 
higher  resolution  SNP  mapping  arrays  and  other 
cytogenetic  and  molecular  methods  are  needed  to 
better  define  the  boundaries  of  the  deletions  and  to 
verify  if  and  how  the  deletions  result  in  the  fusion  of 
ERG  and  TMPRSS2. 

In  summary,  we  have  identified  both  unique  and 
recurrent  CN  alterations  occurring  across  the  ge¬ 
nome  of  clinical  prostate  cancers  using  the  100K 
SNP  array.  The  increased  resolution  and  genome 
wide  nature  of  these  data  provide  a  comprehensive 
and  systematic  approach  to  dissection  of  the  altera¬ 
tions  at  the  levels  of  specific  genes  and/or  regula¬ 
tory  elements  which  characterize  and/or  may  be 
driving  the  development  of  prostate  cancer.  This 
may  in  turn  translate  into  more  effective  manage¬ 
ment  of  this  heterogeneous  disease. 
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